检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

22,774 篇 会议
111 篇 期刊文献
23 册 图书

馆藏范围

22,907 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,400 篇 工学
- 10,880 篇 计算机科学与技术...
- 3,450 篇 软件工程
- 2,429 篇 机械工程
- 1,723 篇 光学工程
- 1,011 篇 控制科学与工程
- 998 篇 电气工程
- 761 篇 信息与通信工程
- 393 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 214 篇 电子科学与技术（可...
- 113 篇 化学工程与技术
- 112 篇 安全科学与工程
- 98 篇 测绘科学与技术
- 93 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,361 篇 医学
- 3,347 篇 临床医学
- 79 篇 基础医学(可授医学...
3,251 篇 理学
- 1,953 篇 物理学
- 1,665 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 109 篇 化学
506 篇 管理学
- 299 篇 图书情报与档案管...
- 219 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
62 篇 法学
- 59 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
3 篇 文学

主题

10,126 篇 computer vision
4,026 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,792 篇 cameras
1,759 篇 visualization
1,484 篇 shape
1,466 篇 image segmentati...
1,445 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,170 篇 computer archite...
1,146 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
63 篇 adobe research
63 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,719 篇 英文
162 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22908 条记录，以下是191-200 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Mitigating Object Hallucinations in Large vision-Language Models through Visual Contrastive Decoding

Mitigating Object Hallucinations in Large Vision-Language Mo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Leng, Sicong Zhang, Hang Chen, Guanzheng Li, Xin Lug, Shijian Miao, Chunyan Bing, Lidong Alibaba Grp DAMO Acad Hangzhou Peoples R China Nanyang Technol Univ Singapore Singapore Hupan Lab Hangzhou 310023 Peoples R China

ISBN: (纸本)9798350353006

Large vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without either additional training or the usage of external tools, significantly mitigates the object hallucination issue across different LVLM families. Beyond mitigating object hallucinations, VCD also excels in general LVLM benchmarks, highlighting its wide-ranging applicability.

关键词： Large Multimodal Models Multimodality vision and Language

来源：评论

学校读者我要写书评

暂无评论

Building vision-Language Models on Solid Foundations with Masked Distillation

Building Vision-Language Models on Solid Foundations with Ma...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Sameni, Sepehr Kafle, Kushal Tan, Hao Jenni, Simon Univ Bern Bern Switzerland Adobe Res San Jose CA USA

ISBN: (纸本)9798350353006

Recent advancements in vision-Language Models (VLMs) have marked a significant leap in bridging the gap between computer vision and natural language processing. However, traditional VLMs, trained through contrastive learning on limited and noisy image-text pairs, often lack the spatial and linguistic understanding to generalize well to dense vision tasks or less common languages. Our approach, Solid Foundation CLIP (SF-CLIP), circumvents this issue by implicitly building on the solid visual and language understanding of foundational models trained on vast amounts of unimodal data. SF-CLIP integrates contrastive image-text pretraining with a masked knowledge distillation from large foundational text and vision models. This methodology guides our VLM in developing robust text and image representations. As a result, SF-CLIP shows exceptional zero-shot classification accuracy and enhanced image and text retrieval capabilities, setting a new state of the art for ViT-B/16 trained on YFCC15M and CC12M. Moreover, the dense per-patch supervision enhances our zero-shot and linear probe performance in semantic segmentation tasks. A remarkable aspect of our model is its multilingual proficiency, evidenced by strong retrieval results in multiple languages despite being trained predominantly on English data. We achieve all of these improvements without sacrificing the training efficiency through our selective application of masked distillation and the inheritance of teacher word embeddings.

关键词： CLIP Distillation LLM Multilingual Multimodal Representation Learning

来源：评论

学校读者我要写书评

暂无评论

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

AIDE: An Automatic Data Engine for Object Detection in Auton...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Liang, Mingfu Sue, Jong-Chyi Schulter, Samuel Garg, Sparsh Zhao, Shiyu Wu, Ying Chandraker, Manmohan Northwestern Univ Evanston IL 60208 USA NEC Labs Amer Princeton NJ USA Rutgers State Univ New Brunswick NJ USA Univ Calif San Diego San Diego CA USA

ISBN: (纸本)9798350353006

Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed perception model. This necessitates an expensive process of continuously curating and annotating data with significant human effort. We propose to leverage recent advances in vision-language and large language models to design an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. This process operates iteratively, allowing for continuous self-improvement of the model. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.

关键词： Automatic Data Engine Autonomous Driving Continual Training Large Language Model Novel Object Detection vision Language Model

来源：评论

学校读者我要写书评

暂无评论

LLaFS: When Large Language Models Meet Few-Shot Segmentation

LLaFS: When Large Language Models Meet Few-Shot Segmentation

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhu, Lanyun Chen, Tianrun Ji, Deyi Ye, Jieping Liu, Jun Singapore Univ Technol & Design Singapore Singapore Zhejiang Univ Hangzhou Peoples R China Alibaba Grp Hangzhou Peoples R China

ISBN: (纸本)9798350353013;9798350353006

This paper proposes LLaFS, the first attempt to leverage large language models (LLMs) in few-shot segmentation. In contrast to the conventional few-shot segmentation methods that only rely on the limited and biased information from the annotated support images, LLaFS leverages the vast prior knowledge gained by LLM as an effective supplement and directly uses the LLM to segment images in a few-shot manner. To enable the text-based LLM to handle image-related tasks, we carefully design an input instruction that allows the LLM to produce segmentation results represented as polygons, and propose a region-attribute table to simulate the human visual mechanism and provide multi-modal guidance. We also synthesize pseudo samples and use curriculum learning for pretraining to augment data and achieve better optimization. LLaFS achieves state-of-the-art results on multiple datasets, showing the potential of using LLMs for few-shot computer vision tasks.

关键词： Few-shot segmentation Large vision-language models

来源：评论

学校读者我要写书评

暂无评论

Solving Masked Jigsaw Puzzles with Diffusion vision Transformers

Solving Masked Jigsaw Puzzles with Diffusion Vision Transfor...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Liu, Jinyang Teshome, Wondmgezahu Ghimire, Sandesh Sznaier, Mario Camps, Octavia Northeastern Univ Boston MA 02115 USA Qualcomm San Diego CA USA

ISBN: (纸本)9798350353006

Solving image and video jigsaw puzzles poses the challenging task of rearranging image fragments or video frames from unordered sequences to restore meaningful images and video sequences. Existing approaches often hinge on discriminative models tasked with predicting either the absolute positions of puzzle elements or the permutation actions applied to the original data. Unfortunately, these methods face limitations in effectively solving puzzles with a large number of elements. In this paper, we propose JPDVT, an innovative approach that harnesses diffusion transformers to address this challenge. Specifically, we generate positional information for image patches or video frames, conditioned on their underlying visual content. This information is then employed to accurately assemble the puzzle pieces in their correct positions, even in scenarios involving missing pieces. Our method achieves state-of-the-art performance on several datasets.

关键词： data imputation diffusion models Solving puzzles

来源：评论

学校读者我要写书评

暂无评论

MAFA: Managing False Negatives for vision-Language Pre-training

MAFA: Managing False Negatives for Vision-Language Pre-train...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Byun, Jaeseok Kim, Dohoon Moon, Taesup Seoul Natl Univ Dept ECE Seoul South Korea Seoul Natl Univ Dept ASRI INMC IPAI AIIS Seoul South Korea

ISBN: (纸本)9798350353006

We consider a critical issue of false negatives in vision-Language Pre-training (VLP), a challenge that arises from the inherent many-to-many correspondence of image-text pairs in large-scale web-crawled datasets. The presence of false negatives can impede achieving optimal performance and even lead to a significant performance drop. To address this challenge, we propose MAFA (MAnaging FAlse negatives), which consists of two pivotal components building upon the recently developed GRouped mIni-baTch sampling (GRIT) strategy: 1) an efficient connection mining process that identifies and converts false negatives into positives, and 2) label smoothing for the image-text contrastive (ITC) loss. Our comprehensive experiments verify the effectiveness of MAFA across multiple downstream tasks, emphasizing the crucial role of addressing false negatives in VLP, potentially even surpassing the importance of addressing false positives. In addition, the compatibility of MAFA with the recent BLIP-family model is also demonstrated. Code is available at https://***/jaeseokbyun/MAFA.

关键词：

来源：评论

学校读者我要写书评

暂无评论

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN

StyleCineGAN: Landscape Cinemagraph Generation using a Pre-t...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Choi, Jongwoo Seo, Kwanggyoon Ashtari, Amirsaman Noh, Junyong Korea Adv Inst Sci & Technol Visual Media Lab Daejeon South Korea

ISBN: (纸本)9798350353013;9798350353006

We propose a method that can generate cinemagraphs automatically from a still landscape image using a pre-trained StyleGAN. Inspired by the success of recent unconditional video generation, we leverage a powerful pre-trained image generator to synthesize high-quality cinemagraphs. Unlike previous approaches that mainly utilize the latent space of a pre-trained StyleGAN, our approach utilizes its deep feature space for both GAN inversion and cinemagraph generation. Specifically, we propose multi-scale deep feature warping (MSDFW), which warps the intermediate features of a pre-trained StyleGAN at different resolutions. By using MSDFW, the generated cinemagraphs are of high resolution and exhibit plausible looping animation. We demonstrate the superiority of our method through user studies and quantitative comparisons with state-of-the-art cinemagraph generation methods and a video generation method that uses a pre-trained StyleGAN.

关键词： Image and video synthesis and generation vision + graphics vision applications and systems

来源：评论

学校读者我要写书评

暂无评论

SpikingResformer: Bridging ResNet and vision Transformer in Spiking Neural Networks

SpikingResformer: Bridging ResNet and Vision Transformer in ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shi, Xinyu Hao, Zecheng Yu, Zhaofei Peking Univ Inst Artificial Intelligence Beijing Peoples R China Peking Univ Sch Comp Sci Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

The remarkable success of vision Transformers in Artificial Neural Networks (ANNs) has led to a growing interest in incorporating the self-attention mechanism and transformer-based architecture into Spiking Neural Networks (SNNs). While existing methods propose spiking self-attention mechanisms that are compatible with SNNs, they lack reasonable scaling methods, and the overall architectures proposed by these methods suffer from a bottleneck in effectively extracting local features. To address these challenges, we propose a novel spiking self-attention mechanism named Dual Spike Self-Attention (DSSA) with a reasonable scaling method. Based on DSSA, we propose a novel spiking vision Transformer architecture called SpikingResformer, which combines the ResNet-based multi-stage architecture with our proposed DSSA to improve both performance and energy efficiency while reducing parameters. Experimental results show that SpikingResformer achieves higher accuracy with fewer parameters and lower energy consumption than other spiking vision Transformer counterparts. Notably, our SpikingResformer-L achieves 79.40% top-1 accuracy on ImageNet with 4 time-steps, which is the state-of-the-art result in the SNN field. Codes are available at https://***/xyshi2000/SpikingResformer

关键词： Spiking Neural Networks vision Transformer

来源：评论

学校读者我要写书评

暂无评论

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Attention-Propagation Network for Egocentric Heatmap to 3D P...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kang, Taeho Lee, Youngki Seoul Natl Univ Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation. Severe self-occlusion and out-of-view limbs in egocentric camera views make accurate pose estimation a challenging problem. To address the challenge, prior methods employ joint heatmaps-probabilistic 2D representations of the body pose, but heatmap-to-3D pose conversion still remains an inaccurate process. We propose a novel heatmap-to-3D lifting method composed of the Grid ViT Encoder and the Propagation Network. The Grid ViT Encoder summarizes joint heatmaps into effective feature embedding using self-attention. Then, the Propagation Network estimates the 3D pose by utilizing skeletal information to better estimate the position of obscure joints. Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively demonstrated by a 23.9% reduction of error in an MPJPE metric. Our source code is available on GitHub (1).

关键词： 3D pose estimation egocentric vision stereo vision

来源：评论

学校读者我要写书评

暂无评论

OOSTraj: Out-of-Sight Trajectory Prediction With vision-Positioning Denoising

OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Posi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Haichao Xu, Yi Lu, Hongsheng Shimizu, Takayuki Fu, Yun Northeastern Univ 360 Huntington Ave Boston MA 02115 USA Toyota Motor North Amer 465 N Bernardo Ave Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

Trajectory prediction is fundamental in computer vision and autonomous driving, particularly for understanding pedestrian behavior and enabling proactive decision-making. Existing approaches in this field often assume precise and complete observational data, neglecting the challenges associated with out-of-view objects and the noise inherent in sensor data due to limited camera range, physical obstructions, and the absence of ground truth for denoised sensor data. Such oversights are critical safety concerns, as they can result in missing essential, non-visible objects. To bridge this gap, we present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique. Our approach denoises noisy sensor observations in an unsupervised manner and precisely maps sensor-based trajectories of out-of-sight objects into visual trajectories. This method has demonstrated state-of-the-art performance in out-of-sight noisy sensor trajectory denoising and prediction on the Vi-Fi and JRDB datasets. By enhancing trajectory prediction accuracy and addressing the challenges of out-of-sight objects, our work significantly contributes to improving the safety and reliability of autonomous driving in complex environments. Our work represents the first initiative towards Out-Of-Sight Trajectory prediction (OOSTraj), setting a new benchmark for future research.

关键词： autonomous driving denosing OOSTraj Out-of-Sight Trajectory Prediction Trajectory Prediction vision-Positioning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 16 17 18 19 20 21 22 23 24 25 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：