检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

6,639 篇 会议
34 篇 期刊文献
5 册 图书

馆藏范围

6,677 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

3,950 篇 工学
- 3,725 篇 计算机科学与技术...
- 1,476 篇 软件工程
- 807 篇 光学工程
- 323 篇 信息与通信工程
- 240 篇 控制科学与工程
- 206 篇 机械工程
- 169 篇 电气工程
- 85 篇 生物医学工程（可授...
- 73 篇 电子科学与技术（可...
- 70 篇 生物工程
- 65 篇 仪器科学与技术
- 38 篇 建筑学
- 36 篇 土木工程
- 34 篇 力学（可授工学、理...
- 32 篇 航空宇航科学与技...
- 29 篇 安全科学与工程
- 23 篇 化学工程与技术
- 21 篇 材料科学与工程（可...
1,498 篇 理学
- 969 篇 物理学
- 929 篇 数学
- 369 篇 统计学（可授理学、...
- 136 篇 生物学
- 40 篇 系统科学
- 26 篇 化学
210 篇 医学
- 210 篇 临床医学
- 23 篇 基础医学(可授医学...
165 篇 管理学
- 123 篇 图书情报与档案管...
- 44 篇 管理科学与工程(可...
- 29 篇 工商管理
21 篇 法学
- 21 篇 社会学
10 篇 农学
9 篇 教育学
6 篇 经济学
2 篇 军事学
1 篇 艺术学

主题

2,364 篇 computer vision
848 篇 pattern recognit...
663 篇 cameras
634 篇 computer science
592 篇 face recognition
558 篇 layout
541 篇 conferences
527 篇 image segmentati...
514 篇 shape
454 篇 object recogniti...
453 篇 robustness
394 篇 humans
339 篇 feature extracti...
324 篇 training
305 篇 object detection
263 篇 image recognitio...
260 篇 application soft...
249 篇 lighting
248 篇 computational mo...
238 篇 image reconstruc...

机构

44 篇 microsoft resear...
27 篇 department of co...
21 篇 swiss fed inst t...
21 篇 school of comput...
21 篇 carnegie mellon ...
20 篇 department of co...
19 篇 swiss fed inst t...
18 篇 department of co...
17 篇 department of in...
17 篇 the robotics ins...
17 篇 institute of com...
16 篇 univ sci & techn...
16 篇 robotics institu...
15 篇 tsinghua univ pe...
14 篇 department of el...
14 篇 center for autom...
14 篇 school of comput...
14 篇 school of comput...
13 篇 univ maryland co...
13 篇 microsoft resear...

作者

39 篇 timofte radu
28 篇 s.k. nayar
25 篇 huang thomas s.
24 篇 xiaoou tang
22 篇 t. kanade
20 篇 chellappa rama
20 篇 t.s. huang
19 篇 van gool luc
19 篇 nayar shree k.
19 篇 t. darrell
17 篇 a.k. jain
17 篇 a. zisserman
17 篇 heung-yeung shum
17 篇 jain anil k.
17 篇 zisserman andrew
16 篇 g. healey
16 篇 torralba antonio
16 篇 l. van gool
15 篇 ying wu
15 篇 m. shah

语言

6,668 篇 英文
8 篇 中文
2 篇 其他

检索条件"任意字段=2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2003"

共 6678 条记录，以下是121-130 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models

Task Navigator: Decomposing Complex Tasks for Multimodal Lar...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ma, Feipeng Zhou, Yizhou Zhang, Yueyi Wu, Siying Zhang, Zheyu He, Zilong Rao, Fengyun Sun, Xiaoyan Univ Sci & Technol China Hefei Peoples R China Tencent Inc WeChat Shenzhen Peoples R China Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei Peoples R China

ISBN: (纸本)9798350365474

Inspired by the remarkable progress achieved by recent Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) take LLMs as their brains, and have achieved surprising results in many downstream tasks by training on a large amount of task-specific data. However, when faced with complex tasks that require the collaboration of multiple capabilities, existing MLLMs recollect training data and retrain the model, ignoring the systematic utilization of LLMs and their possessed capabilities learned in downstream tasks. Inspired by the way humans tackle complex questions, in this paper, we propose a novel framework called Task Navigator. In our framework, LLMs act as navigators to chart a viable path for solving complex tasks and guide MLLMs through the process step by step. Specifically, LLMs iteratively break down sub-problems and refine them to be more reasonable and answerable, which are subsequently resolved by MLLMs to obtain relevant subanswers, until the LLMs have collected enough information to answer the initial question. Task Navigator provides an effective way to extend MLLMs to tackle complex tasks, thus broadening MLLMs' applicability. To evaluate the performance of the proposed framework, we have curated a carefully designed benchmark called VersaChallenge. Experiments on VersaChallenge demonstrate the effectiveness of our proposed method.

关键词： Language and vision Multi-modal vision

来源：评论

学校读者我要写书评

暂无评论

Universal Guidance for Diffusion Models

Universal Guidance for Diffusion Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Bansal, Arpit Chu, Hong-Min Schwarzschild, Avi Sengupta, Soumyadip Goldblum, Micah Geiping, Jonas Goldstein, Tom Univ Maryland College Pk MD 20742 USA Univ North Carolina Chapel Hill Chapel Hill NC USA NYU New York NY USA

ISBN: (纸本)9798350302493

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at ***/arpitbansal297/UniversalGuided-Diffusion.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Arefeen, Md Adnan Debnath, Biplob Uddin, Md Yusuf Sarwar Chakradhar, Srimat NEC Labs Amer Princeton NJ 08540 USA Univ Missouri Kansas City MO 64110 USA

ISBN: (纸本)9798350365474

Retrieval-augmented generation (RAG) is used in natural language processing (NLP) to provide query-relevant information in enterprise documents to large language models (LLMs). Such enterprise context enables the LLMs to generate more informed and accurate responses. When enterprise data is primarily videos, AI models like vision language models (VLMs) are necessary to convert information in videos into text. While essential, this conversion is a bottleneck, especially for large corpus of videos. It delays the timely use of enterprise videos to generate useful responses. We propose ViTA, a novel method that leverages two unique characteristics of VLMs to expedite the conversion process. As VLMs output more text tokens, they incur higher latency. In addition, large (heavyweight) VLMs can extract intricate details from images and videos, but they incur much higher latency per output token when compared to smaller (lightweight) VLMs that may miss details. To expedite conversion, ViTA first employs a lightweight VLM to quickly understand the gist or overview of an image or a video clip, and directs a heavyweight VLM (through prompt engineering) to extract additional details by using only a few (preset number of) output tokens. Our experimental results show that ViTA expedites the conversion time by as much as 43%, without compromising the accuracy of responses when compared to a baseline system that only uses a heavyweight VLM.

关键词： Large Language Models (LLMs) Natural Language Processing Retrieval Augmented Generation (RAG) Video Analytics vision Language Models (VLMs)

来源：评论

学校读者我要写书评

暂无评论

Scattering Prompt Tuning: A Fine-tuned Foundation Model for SAR Object recognition

Scattering Prompt Tuning: A Fine-tuned Foundation Model for ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Guo, Weilong Li, Shengyang Yang, Jian Chinese Acad Sci Key Lab Space Utilizat Beijing 100864 Peoples R China Chinese Acad Sci Technol & Engn Ctr Space Utilizat Beijing 100864 Peoples R China Univ Chinese Acad Sci Beijing Peoples R China

ISBN: (纸本)9798350365474

Synthetic Aperture Radar (SAR) serves as a vital tool in various earth observation applications, providing robust imaging under challenging weather conditions. While the fine-tuned foundation models excel in many downstream tasks, they struggle with SAR object recognition because of SAR's unique imaging and scattering characteristics. In this study, we propose a novel approach named Scattering Prompt Tuning (SPT) based vision foundation model. It uses SAR image scattering information as a prompt and integrates learnable parameters into the pre-trained model's input space to help learn SAR's unique information. We also employ a lightweight Residual AdapterMLP for fine-tuning, design a Sequential Feature Aggregation (SFA) to selectively fuse features from different transformer blocks effectively, and develop a Dynamic Distributional Contrast loss (DCLoss) to maintain the proper distance between different objects in feature space. Additionally, a four-stage training strategy, incorporating semi-supervised learning, is deployed to enhance SAR object recognition performance further. Our approach reaches a Top-1 accuracy of 37.9% and an AUROC of 0.83 on the final dataset, winning the first place in the SAR Classification track of PBVS 2024 Multi-modal Aerial View Object Classification Challenge, which is better than the latest advanced fine-tuned foundation models.

关键词： Fine-tuned Foundation Model Object recognition SAR

来源：评论

学校读者我要写书评

暂无评论

Extending global-local view alignment for self-supervised learning with remote sensing imagery

Extending global-local view alignment for self-supervised le...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wanyan, Xinye Seneviratne, Sachith Shen, Shuchang Kirley, Michael

ISBN: (纸本)9798350365474

Since large number of high-quality remote sensing images are readily accessible, exploiting the corpus of images with less manual annotation draws increasing attention. Self-supervised models acquire general feature representations by formulating a pretext task that generates pseudolabels for massive unlabeled data to provide supervision for training. While prior studies have explored multiple self-supervised learning techniques in remote sensing domain, pretext tasks based on local-global view alignment remain underexplored, despite achieving state-of-the-art results on natural imagery. Inspired by DINO [6], which employs an effective representation learning structure with knowledge distillation based on global-local view alignment, we formulate two pretext tasks for self-supervised learning on remote sensing imagery (SSLRS). Using these tasks, we explore the effectiveness of positive temporal contrast as well as multi-sized views on SSLRS. We extend DINO and propose DINO-MC which uses local views of various sized crops instead of a single fixed size in order to alleviate the limited variation in object size observed in remote sensing imagery. Our experiments demonstrate that even when pre-trained on only 10% of the dataset, DINO-MC performs on par or better than existing state-of-the-art SSLRS methods on multiple remote sensing tasks, while using less computational resources. All codes, models, and results are released at https://***/WennyXY/DINO-MC.

关键词： computer vision remote sensing imagery self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Photo-Realistic Image Restoration in the Wild with Controlled vision-Language Models

Photo-Realistic Image Restoration in the Wild with Controlle...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Luo, Ziwei Gustafsson, Fredrik K. Zhao, Zheng Sjolund, Jens Schon, Thomas B. Uppsala Univ Uppsala Sweden Karolinska Inst Stockholm Sweden

ISBN: (纸本)9798350365474

Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.

关键词： Diffusion model Image restoration real-world Super-resolution

来源：评论

学校读者我要写书评

暂无评论

DiCo-NeRF: Difference of Cosine Similarity for Neural Rendering of Fisheye Driving Scenes

DiCo-NeRF: Difference of Cosine Similarity for Neural Render...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Choi, Jiho Hwang, Gyutae Lee, Sang Jun Jeonbuk Natl Univ Jeonju South Korea

ISBN: (纸本)9798350365474

Neural radiance fields have emerged in the field of autonomous driving, which contributes to improve perception of the complex 3D environment through the reconstruction of geometry and appearance. Moving objects and sky for outdoor environment is challenging to optimize the NeRF model. Previous work addresses these challenges through preprocessing such as masking;however, the masking process requires additional ground-truth data and a segmentation network. We propose DiCo-NeRF, an approach for driving scenes by leveraging cosine similarity map differences of vision-language aligned model. DiCo-NeRF investigates the correlation between rendered patches and pre-defined text and adjusts the loss of challenging patches, such as moving objects and the sky. Our neural radiance field utilizes embedding vectors from a pre-trained CLIP to obtain the cosine similarity maps. We introduce SimLoss, a loss function aimed at regulating the color field of NeRF based on the quantified distribution differences between ground-truth and rendered similarity maps. Unlike previous NeRF models that used driving datasets, our approach does not require additional input, such as sensor data, to the model. Experimental results demonstrate that the incorporation of language semantic cues improves the performance of the novel view synthesis task, particularly in complex driving environments. We conducted experiments that included fisheye driving scenes from the KITTI360 and real-world datasets. Our code is available at https://***/ziiho08/DiCoNeRF.

关键词： Autonomous driving Fisheye camera Neural Radiance Fields vision-Language models

来源：评论

学校读者我要写书评

暂无评论

Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from vision Transformer

Generalized Single-Image-Based Morphing Attack Detection Usi...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Haoyu Ramachandra, Raghavendra Raja, Kiran Busch, Christoph Norwegian Univ Sci & Technol Trondheim Norway Darmstadt Univ Appl Sci Darmstadt Germany

ISBN: (纸本)9798350365474

Face morphing attacks have posed severe threats to Face recognition Systems (FRS), which are operated in border control and passport issuance use cases. Correspondingly, morphing attack detection algorithms (MAD) are needed to defend against such attacks. MAD approaches must be robust enough to handle unknown attacks in an open-set scenario where attacks can originate from various morphing generation algorithms, post-processing and the diversity of printers/scanners. The problem of generalization is further pronounced when the detection has to be made on a single suspected image. In this paper, we propose a generalized single-image-based MAD (S-MAD) algorithm by learning the encoding from vision Transformer (ViT) architecture. Compared to CNN-based architectures, ViT model has the advantage on integrating local and global information and hence can be suitable to detect the morphing traces widely distributed among the face region. Extensive experiments are carried out on face morphing datasets generated using publicly available FRGC face datasets. Several state-of-the-art (SOTA) MAD algorithms, including representative ones that have been publicly evaluated, have been selected and benchmarked with our ViT-based approach. Obtained results demonstrate the improved detection performance of the proposed S-MAD method on inter-dataset testing (when different data is used for training and testing) and comparable performance on intra-dataset testing (when the same data is used for training and testing) experimental protocol.

关键词： Face recognition Morphing Attack Detection

来源：评论

学校读者我要写书评

暂无评论

Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large vision Language Model

Divide and Conquer Boosting for Enhanced Traffic Safety Desc...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Khai Trinh Xuan Khoi Nguyen Nguyen Bach Hoang Ngo Vu Dinh Xuan Minh-Hung An Quang-Vinh Dinh Ho Chi Minh City Univ Technol VNU HCM Ho Chi Minh City Vietnam Univ Sci VNU HCM Ho Chi Minh City Vietnam Univ Informat Technol VNU HCM Ho Chi Minh City Vietnam FPT Telecom Hanoi Vietnam AI Lab AI VIETNAM Ho Chi Minh City Vietnam Vietnam Natl Univ Ho Chi Minh City Ho Chi Minh City Vietnam

ISBN: (纸本)9798350365474

The increasing complexity of traffic dynamics has underscored the necessity for advanced traffic safety description and analysis, challenging the efficacy of current methodologies in comprehensively understanding and predicting safety conditions from transportation videos. This paper addresses these challenges by analyzing specific segments crucial for precise traffic safety descriptions. Through this examination, we introduce an innovative preprocessing method named "segment extraction", facilitating the creation of a novel segment-based training dataset. Additionally, we present a practical two-stage training framework specifically tailored for this dataset. This framework is designed to produce accurate descriptions of traffic safety by incorporating the unique attributes of our segment-based training datasets. Leveraging these contributions, our method achieved a notable 2nd rank with a score of 32.8877 in the AI City Challenge Track2 test set: Traffic Safety Description and Analysis 2024. The source code for the proposed approaches is openly accessible at https://***/AIVIETNAMResearch/AI-CIty2024-Track2

关键词： large language model large vision language model video captioning video instance captioning video-text model

来源：评论

学校读者我要写书评

暂无评论

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Sate...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Dhakal, Aayush Ahmad, Adeel Khanal, Subash Sastry, Srikumar Kerner, Hannah Jacobs, Nathan Washington Univ St Louis MO 63110 USA Taylor Geospatial Inst St Louis MO USA Arizona State Univ Tempe AZ 85287 USA

ISBN: (纸本)9798350365474

We propose a weakly supervised approach for creating maps using free-form textual descriptions. We refer to this work of creating textual maps as zero-shot mapping. Prior works have approached mapping tasks by developing models that predict a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Mapping text, on the other hand, allows us to solve a large variety of mapping problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset with 6.1M pairs of overhead and ground-level images. For a given location and overhead image, our model predicts the expected CLIP embeddings of the ground-level scenery. The predicted CLIP embeddings are then used to learn about the textual space associated with that location. Sat2Cap is also conditioned on date-time information, allowing it to model temporally varying concepts over a location. Our experimental results demonstrate that our models successfully capture ground-level concepts and allow large-scale mapping of fine-grained textual queries. Our approach does not require any text-labeled data, making the training easily scalable. The code, dataset, and models will be made publicly available.

关键词： Contrastive Learning Text-based Mapping vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 9 10 11 12 13 14 15 16 17 18 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：