检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,884 篇 会议
5 篇 期刊文献

馆藏范围

11,889 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,055 篇 工学
- 7,613 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 356 篇 软件工程
- 225 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,344 篇 医学
- 3,343 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
250 篇 理学
- 198 篇 系统科学
- 29 篇 物理学
- 21 篇 生物学
- 15 篇 数学
- 9 篇 统计学（可授理学、...
- 4 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,863 篇 英文
25 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11889 条记录，以下是201-210 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with vision-Language Models

From Pixels to Graphs: Open-Vocabulary Scene Graph Generatio...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Li, Rongjie Zhang, Songyang Lin, Dahua Chen, Kai He, Xuming ShanghaiTech Univ Sch Informat Sci & Technol Shanghai Peoples R China Shanghai AI Lab Shanghai Peoples R China Shanghai Engn Res Ctr Intelligent Vis & Imaging Shanghai Peoples R China

ISBN: (纸本)9798350353006

Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for down-stream reasoning tasks. Despite recent advancements, existing methods struggle to generate scene graphs with novel visual relation concepts. To address this challenge, we introduce a new open-vocabulary SGG framework based on sequence generation. Our framework leverages vision-language pre-trained models (VLM) by incorporating an image-to-graph generation paradigm. Specifically, we generate scene graph sequences via image-to-text generation with VLM and then construct scene graphs from these sequences. By doing so, we harness the strong capabilities of VLM for open-vocabulary SGG and seamlessly integrate explicit relational modeling for enhancing the VL tasks. Experimental results demonstrate that our design not only achieves superior performance with an open vocabulary but also enhances downstream vision-language task performance through explicit relation modeling knowledge.

关键词： Scene Graph Generation Scene Understanding vision-language

来源：评论

学校读者我要写书评

暂无评论

Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf vision-Language Models

Emergent Open-Vocabulary Semantic Segmentation from Off-the-...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Luo, Jiayun Khandelwal, Siddhesh Sigal, Leonid Li, Boyang Nanyang Technol Univ Singapore Singapore Univ British Columbia Vector Inst AI Vancouver BC Canada

ISBN: (纸本)9798350353013;9798350353006

From image-text pairs, large-scale vision-language models (VLMs) learn to implicitly associate image regions with words, which prove effective for tasks like visual question answering. However, leveraging the learned association for open-vocabulary semantic segmentation remains a challenge. In this paper, we propose a simple, yet extremely effective, training-free technique, Plug-and-Play Open-Vocabulary Semantic Segmentation (PnP-OVSS) for this task. PnP-OVSS leverages a VLM with direct text-to-image cross-attention and an image-text matching loss. To balance between over-segmentation and under-segmentation, we introduce Salience Dropout;by iteratively dropping patches that the model is most attentive to, we are able to better resolve the entire extent of the segmentation mask. PnP-OVSS does not require any neural network training and performs hyperparameter tuning without the need for any segmentation annotations, even for a validation set. PnP-OVSS demonstrates substantial improvements over comparable baselines (+29.4% mIoU on Pascal VOC, +13.2% mIoU on Pascal Context, +14.0% mIoU on MS COCO, +2.4% mIoU on COCO Stuff) and even outperforms most baselines that conduct additional network training on top of pretrained VLMs. Our codebase is at https://***/letitiabanana/PnP-OVSS.

关键词： open-vocabulary semantic segmentation training-free

来源：评论

学校读者我要写书评

暂无评论

Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration

Dynamic Cues-Assisted Transformer for Robust Point Cloud Reg...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Hong Yan, Pei Xiang, Sihe Tan, Yihua Huazhong Univ Sci & Technol Sch Artificial Intelligence & Automat Hubei Engn Res Ctr Machine Vision & Intelligent S Wuhan Peoples R China

ISBN: (纸本)9798350353006

Point Cloud Registration is a critical and challenging task in computer vision. Recent advancements have predominantly embraced a coarse-to-fine matching mechanism, with the key to matching the superpoints located in patches with interframe consistent structures. However, previous methods still face challenges with ambiguous matching, because the interference information aggregated from irrelevant regions may disturb the capture of interframe consistency relations, leading to wrong matches. To address this issue, we propose Dynamic Cues-Assisted Transformer (DCATr). Firstly, the interference from irrelevant regions is greatly reduced by constraining attention to certain cues, i.e., regions with highly correlated structures of potential corresponding superpoints. Secondly, cues-assisted attention is designed to mine the inter-frame consistency relations, while more attention is assigned to pairs with high consistent confidence in feature aggregation. Finally, a dynamic updating fashion is proposed to facilitate mining richer consistency information, further improving aggregated features' distinctiveness and relieving matching ambiguity. Extensive evaluations on indoor and outdoor standard benchmarks demonstrate that DCATr outperforms all state-of-the-art methods.

关键词： ambiguous feature matching Point cloud registration transformer

来源：评论

学校读者我要写书评

暂无评论

AnimalFormer: Multimodal vision Framework for Behavior-based Precision Livestock Farming

AnimalFormer: Multimodal Vision Framework for Behavior-based...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Qazi, Ahmed Razzaq, Taha Iqbal, Asim Tibbling Technol Redmond WA 98052 USA

ISBN: (纸本)9798350365474

We introduce a multimodal vision framework for precision livestock farming, harnessing the power of GroundingDINO, HQSAM, and ViTPose models. This integrated suite enables comprehensive behavioral analytics from video data without invasive animal tagging. GroundingDINO generates accurate bounding boxes around livestock, while HQSAM segments individual animals within these boxes. ViTPose estimates key body points, facilitating posture and movement analysis. Demonstrated on a sheep dataset with grazing, running, sitting, standing, and walking activities, our framework extracts invaluable insights: activity and grazing patterns, interaction dynamics, and detailed postural evaluations. Applicable across species and video resolutions, this framework revolutionizes non-invasive livestock monitoring for activity detection, counting, health assessments, and posture analyses. It empowers data-driven farm management, optimizing animal welfare and productivity through AI-powered behavioral understanding.

关键词： Livestock

来源：评论

学校读者我要写书评

暂无评论

Transductive Zero-Shot and Few-Shot CLIP

Transductive Zero-Shot and Few-Shot CLIP

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Martin, Segolene Huang, Yunshi Shakeri, Fereshteh Pesquet, Jean-Christophe Ben Ayed, Ismail Univ Paris Saclay CVN Cent Supelec INRIA Paris France ETS Montreal Montreal PQ Canada

ISBN: (纸本)9798350353006

Transductive inference has been widely investigated in few-shot image classification, but completely overlooked in the recent, fast growing literature on adapting vision-langage models like CLIP. This paper addresses the transductive zero-shot and few-shot CLIP classification challenge, in which inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently. We initially construct informative vision-text probability features, leading to a classification problem on the unit simplex set. Inspired by Expectation-Maximization (EM), our optimization-based classification objective models the data probability distribution for each class using a Dirichlet law. The minimization problem is then tackled with a novel block Majorization-Minimization algorithm, which simultaneously estimates the distribution parameters and class assignments. Extensive numerical experiments on 11 datasets underscore the benefits and efficacy of our batch inference approach. On zero-shot tasks with test batches of 75 samples, our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance. Additionally, we outperform state-of-the-art methods in the few-shot setting. The code is available at: https://***/SegoleneMartin/transductive-CLIP.

关键词： expectation-maximization few-shot text-vision models transductive learning zero-shot

来源：评论

学校读者我要写书评

暂无评论

Synthesize, Diagnose, and Optimize: Towards Fine-Grained vision-Language Understanding

Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vis...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Peng, Wujian Xi, Sicheng You, Zuyao Lan, Shiyi Wu, Zuxuan Fudan Univ Sch CS Shanghai Key Lab Intell Info Proc Shanghai Peoples R China Shanghai Collaborat Innovat Ctr Intelligent Visua Shanghai Peoples R China NVIDIA Shenzhen Guangdong Peoples R China

ISBN: (纸本)9798350353006

vision language models (VLM) have demonstrated remarkable performance across various downstream tasks. However, understanding fine-grained visual-linguistic concepts, such as attributes and inter-object relationships, remains a significant challenge. While several benchmarks aim to evaluate VLMs in finer granularity, their primary focus remains on the linguistic aspect, neglecting the visual dimension. Here, we highlight the importance of evaluating VLMs from both a textual and visual perspective. We introduce a progressive pipeline to synthesize images that vary in a specific attribute while ensuring consistency in all other aspects. Utilizing this data engine, we carefully design a benchmark, SPEC, to diagnose the comprehension of object size, position, existence, and count. Subsequently, we conduct a thorough evaluation of four leading VLMs on SPEC. Surprisingly, their performance is close to random guess, revealing significant limitations. With this in mind, we propose a simple yet effective approach to optimize VLMs in fine-grained understanding, achieving significant improvements on SPEC without compromising the zero-shot performance. Results on two additional fine-grained benchmarks also show consistent improvements, further validating the transferability of our approach. Code and data are available at https://***/wjpoom/SPEC.

关键词： Fine-grained understdanding vision language model

来源：评论

学校读者我要写书评

暂无评论

Active Prompt Learning in vision Language Models

Active Prompt Learning in Vision Language Models

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Bang, Jihwan Ahn, Sumyeong Lee, Jae-Gil Korea Adv Inst Sci & Technol Daejeon South Korea Michigan State Univ E Lansing MI USA

ISBN: (纸本)9798350353006

Pre-trained vision Language Models (VLMs) have demonstrated notable progress in various zero-shot tasks, such as classification and retrieval. Despite their performance, because improving performance on new tasks requires task-specific knowledge, their adaptation is essential. While labels are needed for the adaptation, acquiring them is typically expensive. To overcome this challenge, active learning, a method of achieving a high performance by obtaining labels for a small number of samples from experts, has been studied. Active learning primarily focuses on selecting unlabeled samples for labeling and leveraging them to train models. In this study, we pose the question, "how can the pre-trained VLMs be adapted under the active learning framework?" In response to this inquiry, we observe that (1) simply applying a conventional active learning framework to pre-trained VLMs even may degrade performance compared to random selection because of the class imbalance in labeling candidates, and (2) the knowledge of VLMs can provide hints for achieving the balance before labeling. Based on these observations, we devise a novel active learning framework for VLMs, denoted as PCB. To assess the effectiveness of our approach, we conduct experiments on seven different real-world datasets, and the results demonstrate that PCB surpasses conventional active learning and random sampling methods. Code is available at https://***/kaist-dmlab/pcb.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Projecting Trackable Thermal patterns for Dynamic computer vision

Projecting Trackable Thermal Patterns for Dynamic Computer V...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Sheinin, Mark Sankaranarayanan, Aswin C. Narasimhan, Srinivasa G. Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9798350353006

Adding artificial patterns to objects, like QR codes, can ease tasks such as object tracking, robot navigation, and conveying information (e.g., a label or a website link). However, these patterns require a physical application and they alter the object's appearance. Conversely, projected patterns can temporarily change the object's appearance, aiding tasks like 3D scanning and retrieving object textures and shading. However, projected patterns impede dynamic tasks like object tracking because they do not 'stick' to the object's surface. Or do they? This paper introduces a novel approach combining the advantages of projected and persistent physical patterns. Our system projects heat patterns using a laser beam (similar in spirit to a LIDAR), which a thermal camera observes and tracks. Such thermal patterns enable tracking poorly-textured objects whose tracking is highly challenging with standard cameras while not affecting the object's appearance or physical properties. To avail these thermal patterns in existing vision frameworks, we train a network to reverse heat diffusion's effects and remove inconsistent pattern points between different thermal frames. We prototyped and tested this approach on dynamic vision tasks like structure from motion, optical flow, and object tracking of everyday textureless objects.

关键词： 3d reconstruction heat Heat diffusion laser optical flow slam structure from motion thermal tracking

来源：评论

学校读者我要写书评

暂无评论

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Gene...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Liu, Zhixuan Schaldenbrand, Peter Okogwu, Beverley-Claire Peng, Wenxuan Yun, Youngsik Hundt, Andrew Kim, Jihie Oh, Jean Carnegie Mellon Univ Pittsburgh PA 15213 USA Nanyang Technol Univ Singapore Singapore Dongguk Univ Seoul South Korea

ISBN: (纸本)9798350353006

Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset that we call the Cross-Cultural Under-standing Benchmark (CCUB) and (2) proposing a novel Self- Contrastive Fine-Tuning (SCoFT, pronounced /soft/) method that leverages the model's known biases to self-improve. SCoFT is designed to prevent overfitting on small datasets, encode only high-level information from the data, and shift the generated distribution away from misrepresentations encoded in a pretrained model. Our user study conducted on 51 participants from 5 different countries based on their self-selected national cultural affiliation shows that fine-tuning on CCUB consistently generates images with higher cultural relevance and fewer stereotypes when compared to the Stable Diffusion baseline, which is further improved with our SCoFT technique. Resources and code are at https://***/SCoFT.

关键词： computer vision for Social Good Image Synthesis

来源：评论

学校读者我要写书评

暂无评论

Choose What You Need: Disentangled Representation Learning for Scene Text recognition, Removal and Editing

Choose What You Need: Disentangled Representation Learning f...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Boqiang Xie, Hongtao Gao, Zuan Wang, Yuxin Univ Sci & Technol China Hefei Peoples R China

ISBN: (纸本)9798350353006

Scene text images contain not only style information (font, background) but also content information (character, texture). Different scene text tasks need different information, but previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance. We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling these two types of features for improved adaptability in better addressing various downstream tasks (choose what you really need). Specifically, we synthesize a dataset of image pairs with identical style but different content. Based on the dataset, we decouple the two types of features by the supervision design. Clearly, we directly split the visual representation into style and content features, the content features are supervised by a text recognition loss, while an alignment loss aligns the style features in the image pairs. Then, style features are employed in reconstructing the counterpart image via an image decoder with a prompt that indicates the counterpart's content. Such an operation effectively decouples the features based on their distinctive properties. To the best of our knowledge, this is the first time in the field of scene text that disentangles the inherent properties of the text images. Our method achieves state-of-the-art performance in Scene Text recognition, Removal, and Editing.

关键词： Scene Text Scene Understanding

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 17 18 19 20 21 22 23 24 25 26 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：