检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

20,860 篇 会议
105 篇 期刊文献
43 册 图书

馆藏范围

21,007 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,620 篇 工学
- 11,056 篇 计算机科学与技术...
- 2,652 篇 机械工程
- 2,252 篇 软件工程
- 914 篇 光学工程
- 885 篇 电气工程
- 529 篇 控制科学与工程
- 477 篇 信息与通信工程
- 216 篇 测绘科学与技术
- 135 篇 生物工程
- 127 篇 生物医学工程（可授...
- 98 篇 电子科学与技术（可...
- 92 篇 仪器科学与技术
- 46 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 化学工程与技术
- 39 篇 土木工程
- 37 篇 交通运输工程
- 35 篇 力学（可授工学、理...
- 33 篇 航空宇航科学与技...
3,494 篇 医学
- 3,489 篇 临床医学
- 32 篇 基础医学(可授医学...
2,247 篇 理学
- 1,145 篇 物理学
- 1,081 篇 数学
- 401 篇 生物学
- 384 篇 统计学（可授理学、...
- 245 篇 系统科学
- 46 篇 化学
343 篇 管理学
- 176 篇 管理科学与工程(可...
- 168 篇 图书情报与档案管...
- 34 篇 工商管理
31 篇 法学
19 篇 农学
15 篇 教育学
8 篇 经济学
5 篇 艺术学
2 篇 军事学
1 篇 文学

主题

8,141 篇 computer vision
2,886 篇 training
2,841 篇 pattern recognit...
1,809 篇 computational mo...
1,715 篇 visualization
1,493 篇 cameras
1,433 篇 three-dimensiona...
1,433 篇 feature extracti...
1,366 篇 shape
1,360 篇 face recognition
1,243 篇 image segmentati...
1,135 篇 robustness
1,124 篇 semantics
992 篇 computer archite...
985 篇 object detection
982 篇 layout
959 篇 benchmark testin...
935 篇 codes
900 篇 computer science
898 篇 object recogniti...

机构

174 篇 univ sci & techn...
158 篇 univ chinese aca...
153 篇 carnegie mellon ...
145 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
95 篇 tsinghua univers...
90 篇 microsoft res as...
90 篇 tsinghua univ pe...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
77 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

80 篇 van gool luc
70 篇 zhang lei
58 篇 timofte radu
48 篇 yang yi
47 篇 luc van gool
46 篇 xiaoou tang
44 篇 tian qi
43 篇 darrell trevor
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
40 篇 li stan z.
38 篇 li fei-fei
37 篇 chen xilin
36 篇 shan shiguang
35 篇 zhou jie
35 篇 vasconcelos nuno
35 篇 liu yang
35 篇 torralba antonio
34 篇 liu xiaoming

语言

20,982 篇 英文
10 篇 中文
7 篇 其他
5 篇 土耳其文
2 篇 日文
2 篇 葡萄牙文

检索条件"任意字段=2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016"

共 21008 条记录，以下是81-90 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Exploring vision Transformers for 3D Human Motion-Language Models with Motion Patches

Exploring Vision Transformers for 3D Human Motion-Language M...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yu, Qing Tanaka, Mikihiro Fujiwara, Kent LY Corp Tokyo Japan

ISBN: (纸本)9798350353013;9798350353006

To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using vision Transformers (ViT) as motion encoders via transfer learning, aiming to extract useful knowledge from the image domain and apply it to the motion domain. These motion patches, created by dividing and sorting skeleton joints based on body parts in motion sequences, are robust to varying skeleton structures, and can be regarded as color image patches in ViT. We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis, presenting a promising direction for addressing the issue of limited motion data. Our extensive experiments show that the proposed motion patches, used jointly with ViT, achieve state-of-the-art performance in the benchmarks of text-to-motion retrieval, and other novel challenging tasks, such as cross-skeleton recognition, zero-shot motion classification, and human interaction recognition, which are currently impeded by the lack of data.

关键词： Motion Representation Motion-Language Models Text-Motion Retrieval

来源：评论

学校读者我要写书评

暂无评论

AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving

AIDE: An Automatic Data Engine for Object Detection in Auton...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liang, Mingfu Sue, Jong-Chyi Schulter, Samuel Garg, Sparsh Zhao, Shiyu Wu, Ying Chandraker, Manmohan Northwestern Univ Evanston IL 60208 USA NEC Labs Amer Princeton NJ USA Rutgers State Univ New Brunswick NJ USA Univ Calif San Diego San Diego CA USA

ISBN: (纸本)9798350353006

Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed perception model. This necessitates an expensive process of continuously curating and annotating data with significant human effort. We propose to leverage recent advances in vision-language and large language models to design an Automatic Data Engine (AIDE) that automatically identifies issues, efficiently curates data, improves the model through auto-labeling, and verifies the model through generation of diverse scenarios. This process operates iteratively, allowing for continuous self-improvement of the model. We further establish a benchmark for open-world detection on AV datasets to comprehensively evaluate various learning paradigms, demonstrating our method's superior performance at a reduced cost.

关键词： Automatic Data Engine Autonomous Driving Continual Training Large Language Model Novel Object Detection vision Language Model

来源：评论

学校读者我要写书评

暂无评论

Solving Masked Jigsaw Puzzles with Diffusion vision Transformers

Solving Masked Jigsaw Puzzles with Diffusion Vision Transfor...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Liu, Jinyang Teshome, Wondmgezahu Ghimire, Sandesh Sznaier, Mario Camps, Octavia Northeastern Univ Boston MA 02115 USA Qualcomm San Diego CA USA

ISBN: (纸本)9798350353006

Solving image and video jigsaw puzzles poses the challenging task of rearranging image fragments or video frames from unordered sequences to restore meaningful images and video sequences. Existing approaches often hinge on discriminative models tasked with predicting either the absolute positions of puzzle elements or the permutation actions applied to the original data. Unfortunately, these methods face limitations in effectively solving puzzles with a large number of elements. In this paper, we propose JPDVT, an innovative approach that harnesses diffusion transformers to address this challenge. Specifically, we generate positional information for image patches or video frames, conditioned on their underlying visual content. This information is then employed to accurately assemble the puzzle pieces in their correct positions, even in scenarios involving missing pieces. Our method achieves state-of-the-art performance on several datasets.

关键词： data imputation diffusion models Solving puzzles

来源：评论

学校读者我要写书评

暂无评论

Do vision and Language Encoders Represent the World Similarly?

Do Vision and Language Encoders Represent the World Similarl...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Maniparambil, Mayug Akshulakov, Raiymbek Djilali, Yasser Abdelaziz Dahou Seddik, Mohamed El Amine Narayan, Sanath Mangalam, Karttikeya O'Connor, Noel E. Dublin City Univ ML Labs Dublin Ireland Univ Calif Berkeley Berkeley CA 94720 USA Technol Innovat Inst Dublin Ireland

ISBN: (纸本)9798350353006

Aligned text-image encoders such as CLIP have become the de-facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an alignment exist between uni-modal vision and language encoders since they fundamentally represent the same physical world? Analyzing the latent spaces structure of vision and language models on image-caption benchmarks using the Centered Kernel Alignment (CKA), we find that the representation spaces of unaligned and aligned encoders are semantically similar. In the absence of statistical similarity in aligned encoders like CLIP, we show that a possible matching of unaligned encoders exists without any training. We frame this as a seeded graph-matching problem exploiting the semantic similarity between graphs and propose two methods - a Fast Quadratic Assignment Problem optimization, and a novel localized CKA metric-based matching/retrieval. We demonstrate the effectiveness of this on several downstream tasks including cross-lingual, cross-domain caption matching and image classification. Code available at ***/mayug/0-shot-llm-vision.

关键词： CLIP Unified Representations vision Language Zero-shot

来源：评论

学校读者我要写书评

暂无评论

Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object recognition

Incorporating Geo-Diverse Knowledge into Prompting for Incre...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Buettner, Kyle Malakouti, Sina Li, Xiang Lorraine Kovashka, Adriana Univ Pittsburgh Intelligent Syst Program Pittsburgh PA 15260 USA Univ Pittsburgh Dept Comp Sci Pittsburgh PA 15260 USA

ISBN: (纸本)9798350353006

Existing object recognition models have been shown to lack robustness in diverse geographical scenarios due to domain shifts in design and context. Class representations need to be adapted to more accurately reflect an object concept under these shifts. In the absence of training data from target geographies, we hypothesize that geographically diverse descriptive knowledge of categories can enhance robustness. For this purpose, we explore the feasibility of probing a large language model for geography-based object knowledge, and we examine the effects of integrating knowledge into zero-shot and learnable soft prompting with CLIP. Within this exploration, we propose geography knowledge regularization to ensure that soft prompts trained on a source set of geographies generalize to an unseen target set. Accuracy gains over prompting baselines on DollarStreet while training only on Europe data are up to +2.8/1.2/1.6 on target data from Africa/Asia/Americas, and +4.6 overall on the hardest classes. Competitive performance is shown vs. few-shot target training, and analysis is provided to direct future study of geographical robustness.

关键词： Zero-shot learning

来源：评论

学校读者我要写书评

暂无评论

SlowFormer: Adversarial Attack on Compute and Energy Consumption of Efficient vision Transformers

SlowFormer: Adversarial Attack on Compute and Energy Consump...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Navaneet, K. L. Koohpayegani, Soroush Abbasi Sleiman, Essam Pirsiavash, Hamed Univ Calif Davis Davis CA 95616 USA Harvard Univ Cambridge MA 02138 USA

ISBN: (纸本)9798350353006

Recently, there has been a lot of progress in reducing the computation of deep models at inference time. These methods can reduce both the computational needs and power usage of deep models. Some of these approaches adaptively scale the compute based on the input instance. We show that such models can be vulnerable to a universal adversarial patch attack, where the attacker optimizes for a patch that when pasted on any image, can increase the compute and power consumption of the model. We run experiments with three different efficient vision transformer methods showing that in some cases, the attacker can increase the computation to the maximum possible level by simply pasting a patch that occupies only 8% of the image area. We also show that a standard adversarial training defense method can reduce some of the attack's success. We believe adaptive efficient methods will be necessary in the future to lower the power usage of expensive deep models, so we hope our paper encourages the community to study the robustness of these methods and develop better defense methods for the proposed attack. Code is available at: https://***/UCDvision/SlowFormer

关键词： Adversarial attack computation attack efficient inference efficient vision transformers energy attack

来源：评论

学校读者我要写书评

暂无评论

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Khanna, Mukul Ramrakhya, Ram Chhablani, Gunjan Yenamandra, Sriram Gervet, Theophile Chang, Matthew Kiraly, Zsolt Chaplot, Devendra Singh Batra, Dhruv Mottaghi, Roozbeh Georgia Inst Technol Atlanta GA 30332 USA Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Illinois Urbana IL USA Mistral AI Paris France Univ Washington Seattle WA USA

ISBN: (纸本)9798350353006

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality as the target. With the progress achieved so far, it is time to move towards universal navigation models capable of handling various goal types, enabling more effective user interaction with robots. To facilitate this goal, we propose GOAT-Bench, a benchmark for the universal navigation task referred to as GO to AnyThing (GOAT). In this task, the agent is directed to navigate to a sequence of targets specified by the category name, language description, or image in an open-vocabulary fashion. We benchmark monolithic RL and modular methods on the GOAT task, analyzing their performance across modalities, the role of explicit and implicit scene memories, their robustness to noise in goal specifications, and the impact of memory in lifelong scenarios.

关键词： computer vision Embodied AI Visual navigation

来源：评论

学校读者我要写书评

暂无评论

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Attention-Propagation Network for Egocentric Heatmap to 3D P...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Kang, Taeho Lee, Youngki Seoul Natl Univ Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation. Severe self-occlusion and out-of-view limbs in egocentric camera views make accurate pose estimation a challenging problem. To address the challenge, prior methods employ joint heatmaps-probabilistic 2D representations of the body pose, but heatmap-to-3D pose conversion still remains an inaccurate process. We propose a novel heatmap-to-3D lifting method composed of the Grid ViT Encoder and the Propagation Network. The Grid ViT Encoder summarizes joint heatmaps into effective feature embedding using self-attention. Then, the Propagation Network estimates the 3D pose by utilizing skeletal information to better estimate the position of obscure joints. Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively demonstrated by a 23.9% reduction of error in an MPJPE metric. Our source code is available on GitHub (1).

关键词： 3D pose estimation egocentric vision stereo vision

来源：评论

学校读者我要写书评

暂无评论

MAFA: Managing False Negatives for vision-Language Pre-training

MAFA: Managing False Negatives for Vision-Language Pre-train...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Byun, Jaeseok Kim, Dohoon Moon, Taesup Seoul Natl Univ Dept ECE Seoul South Korea Seoul Natl Univ Dept ASRI INMC IPAI AIIS Seoul South Korea

ISBN: (纸本)9798350353006

We consider a critical issue of false negatives in vision-Language Pre-training (VLP), a challenge that arises from the inherent many-to-many correspondence of image-text pairs in large-scale web-crawled datasets. The presence of false negatives can impede achieving optimal performance and even lead to a significant performance drop. To address this challenge, we propose MAFA (MAnaging FAlse negatives), which consists of two pivotal components building upon the recently developed GRouped mIni-baTch sampling (GRIT) strategy: 1) an efficient connection mining process that identifies and converts false negatives into positives, and 2) label smoothing for the image-text contrastive (ITC) loss. Our comprehensive experiments verify the effectiveness of MAFA across multiple downstream tasks, emphasizing the crucial role of addressing false negatives in VLP, potentially even surpassing the importance of addressing false positives. In addition, the compatibility of MAFA with the recent BLIP-family model is also demonstrated. Code is available at https://***/jaeseokbyun/MAFA.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Mitigating Object Hallucinations in Large vision-Language Models through Visual Contrastive Decoding

Mitigating Object Hallucinations in Large Vision-Language Mo...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Leng, Sicong Zhang, Hang Chen, Guanzheng Li, Xin Lug, Shijian Miao, Chunyan Bing, Lidong Alibaba Grp DAMO Acad Hangzhou Peoples R China Nanyang Technol Univ Singapore Singapore Hupan Lab Hangzhou 310023 Peoples R China

ISBN: (纸本)9798350353006

Large vision-Language Models (LVLMs) have advanced considerably, intertwining visual recognition and language understanding to generate content that is not only coherent but also contextually attuned. Despite their success, LVLMs still suffer from the issue of object hallucinations, where models generate plausible yet incorrect outputs that include objects that do not exist in the images. To mitigate this issue, we introduce Visual Contrastive Decoding (VCD), a simple and training-free method that contrasts output distributions derived from original and distorted visual inputs. The proposed VCD effectively reduces the over-reliance on statistical bias and unimodal priors, two essential causes of object hallucinations. This adjustment ensures the generated content is closely grounded to visual inputs, resulting in contextually accurate outputs. Our experiments show that VCD, without either additional training or the usage of external tools, significantly mitigates the object hallucination issue across different LVLM families. Beyond mitigating object hallucinations, VCD also excels in general LVLM benchmarks, highlighting its wide-ranging applicability.

关键词： Large Multimodal Models Multimodality vision and Language

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 5 6 7 8 9 10 11 12 13 14 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：