检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是431-440 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

SHViT: Single-Head vision Transformer with Memory Efficient Macro Design

SHViT: Single-Head Vision Transformer with Memory Efficient ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yun, Seokju Ro, Youngmin Univ Seoul Machine Intelligence Lab Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

Recently, efficient vision Transformers have shown great performance with low latency on resource-constrained devices. Conventionally, they use 4x4 patch embeddings and a 4-stage structure at the macro level, while utilizing sophisticated attention with multi-head configuration at the micro level. This paper aims to address computational redundancy at all design levels in a memory-efficient manner. We discover that using larger-stride patchify stem not only reduces memory access costs but also achieves competitive performance by leveraging token representations with reduced spatial redundancy from the early stages. Furthermore, our preliminary analyses suggest that attention layers in the early stages can be substituted with convolutions, and several attention heads in the latter stages are computationally redundant. To handle this, we introduce a single-head attention module that inherently prevents head redundancy and simultaneously boosts accuracy by parallelly combining global and local information. Building upon our solutions, we introduce SHViT, a SingleHead vision Transformer that obtains the state-of-the-art speed-accuracy tradeoff. For example, on ImageNet-1k, our SHViT-S4 is 3.3x, 8.1x, and 2.4x faster than MobileViTv2 x1.0 on GPU, CPU, and iPhone12 mobile device, respectively, while being 1.3% more accurate. For object detection and instance segmentation on MS COCO using MaskRCNN head, our model achieves performance comparable to FastViT-SA12 while exhibiting 3.8x and 2.0x lower backbone latency on GPU and mobile device, respectively.

关键词： CNNs efficient visual backbone vision Transformer

来源：评论

学校读者我要写书评

暂无评论

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis

GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zheng, Shunyuan Zhou, Boyao Shao, Ruizhi Liu, Boning Zhang, Shengping Nie, Liqiang Liu, Yebin Harbin Inst Technol Harbin Peoples R China Tsinghua Univ Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China

ISBN: (纸本)9798350353006

We present a new approach, termed GPS-Gaussian, for synthesizing novel views of a character in a real-time manner. The proposed method enables 2K-resolution rendering under a sparse-view camera setting. Unlike the original Gaussian Splatting or neural implicit rendering methods that necessitate per-subject optimizations, we introduce Gaussian parameter maps defined on the source views and regress directly Gaussian Splatting properties for instant novel view synthesis without any fine-tuning or optimization. To this end, we train our Gaussian parameter regression module on a large amount of human scan data, jointly with a depth estimation module to lift 2D parameter maps to 3D space. The proposed framework is fully differentiable and experiments on several datasets demonstrate that our method outperforms state-of-the-art methods while achieving an exceeding rendering speed. The code is available at https://***/aipixel/GPS-Gaussian.

关键词： Rendering (computer graphics)

来源：评论

学校读者我要写书评

暂无评论

Curriculum Learning for Data-Efficient vision-Language Alignment

Curriculum Learning for Data-Efficient Vision-Language Align...

引用

2023 ieee/CVF conference on computer vision and pattern recognition workshops, CVPRW 2023

作者： Srinivasan, Tejas Ren, Xiang Thomason, Jesse University of Southern California United States

ISBN: (纸本)9798350302493

Aligning image and text encoders from scratch using contrastive learning requires large amounts of paired image-text data. We alleviate this need by aligning individually pre-trained language and vision representation models using a much smaller amount of paired data with a curriculum learning algorithm to learn fine-grained vision-language alignments. TOnICS (Training with Ontology-Informed Contrastive Sampling) initially samples minibatches whose image-text pairs contain a wide variety of objects to learn object-level vision-language alignment, and progressively samples minibatches where all image-text pairs contain the same object to learn finer-grained contextual alignment. Aligning pre-trained BERT and VinVL-OD models to each other using TOnICS outperforms CLIP on downstream zero-shot image retrieval using © 2023 ieee.

关键词： Curricula

来源：评论

学校读者我要写书评

暂无评论

On Scaling up a Multilingual vision and Language Model

On Scaling up a Multilingual Vision and Language Model

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Xi Djolonga, Josip Padlewski, Piotr Mustafa, Basil Changpinyo, Soravit Wu, Jialin Ruiz, Carlos Riquelme Goodman, Sebastian Wang, Xiao Tay, Yi Shakeri, Siamak Dehghani, Mostafa Salz, Daniel Lucic, Mario Tschannen, Michael Nagrani, Arsha Hu, Hexiang Joshi, Mandar Pang, Bo Montgomery, Ceslee Pietrzyk, Paulina Ritter, Marvin Piergiovanni, A. J. Minderer, Matthias Pavetic, Filip Waters, Austin Li, Gang Alabdulmohsin, Ibrahim Beyer, Lucas Amelot, Julien Lee, Kenton Steiner, Andreas Peter Li, Yang Keysers, Daniel Arnab, Anurag Xu, Yuanzhong Rong, Keran Kolesnikov, Alexander Seyedhosseini, Mojtaba Angelova, Anelia Zhai, Xiaohua Houlsby, Neil Soricut, Radu Google Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

We explore the boundaries of scaling up a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. Our model advances the state-of-the-art on most vision-and-language benchmarks considered (20+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.

关键词： language multimodal pretraining vision

来源：评论

学校读者我要写书评

暂无评论

A Transformer-based Late-Fusion Mechanism for Fine-Grained Object recognition in Videos

A Transformer-based Late-Fusion Mechanism for Fine-Grained O...

引用

23rd ieee/CVF Winter conference on Applications of computer vision (WACV)

作者： Koch, Jannik Wolf, Stefan Beyerer, Juergen Fraunhofer IOSB Karlsruhe Germany Karlsruhe Inst Technol Vis & Future Lab Karlsruhe Germany Fraunhofer Ctr Machine Learning Munich Germany

ISBN: (纸本)9798350320565

Fine-grained image classification is limited by only considering a single view while in many cases, like surveillance, a whole video exists which provides multiple perspectives. However, the potential of videos is mostly considered in the context of action recognition while finegrained object recognition is rarely considered as an application for video classification. This leads to recent video classification architectures being inappropriate for the task of fine-grained object recognition. We propose a novel, Transformer-based late-fusion mechanism for finegrained video classification. Our approach achieves superior results to both early-fusion mechanisms, like the Video Swin Transformer, and a simple consensus-based late-fusion baseline with a modern Swin Transformer backbone. Additionally, we achieve improved efficiency, as our results show a high increase in accuracy with only a slight increase in computational complexity. Code is available at: https://***/wolfstefan/tlf.

关键词： Costs Surveillance computer architecture Streaming media Transformers Real-time systems Consensus protocol

来源：评论

学校读者我要写书评

暂无评论

Language-driven Grasp Detection

Language-driven Grasp Detection

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： An Dinh Vuong Minh Nhat Vu Baoru Huang Nghia Nguyen Hieu Le Thieu Vo Anh Nguyen FPT Software AI Ctr Hanoi Vietnam TU Wien Automat Control Inst Vienna Austria Imperial Coll London London England Ton Duc Thang Univ Ho Chi Minh City Vietnam Univ Liverpool Liverpool Merseyside England

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353006

Grasp detection is a persistent and intricate challenge with various industrial applications. Recently, many methods and datasets have been proposed to tackle the grasp detection problem. However, most of them do not consider using natural language as a condition to detect the grasp poses. In this paper, we introduce Grasp-Anything++, a new language- driven grasp detection dataset featuring 1M samples, over 3M objects, and upwards of 10M grasping instructions. We utilize foundation models to create a large-scale scene corpus with corresponding images and grasp prompts. We approach the language-driven grasp detection task as a conditional generation problem. Drawing on the success of diffusion models in generative tasks and given that language plays a vital role in this task, we propose a new language-driven grasp detection method based on diffusion models. Our key contribution is the contrastive training objective, which explicitly contributes to the denoising process to detect the grasp pose given the language instructions. We illustrate that our approach is theoretically supportive. The intensive experiments show that our method outperforms state-of-the-art approaches and allows real-world robotic grasping. Finally, we demonstrate our large-scale dataset enables zero-short grasp detection and is a challenging benchmark for future work.

关键词： Training computer vision Noise reduction Natural languages Grasping Benchmark testing Diffusion models

来源：评论

学校读者我要写书评

暂无评论

OutfitTransformer: Outfit Representations for Fashion Recommendation

OutfitTransformer: Outfit Representations for Fashion Recomm...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Sarkar, Rohan Bodla, Navaneeth Vasileva, Mariya Lin, Yen-Liang Beniwal, Anurag Lu, Alan Medioni, Gerard Purdue Univ W Lafayette IN 47907 USA Amazon Seattle WA 98109 USA

ISBN: (数字)9781665487399

ISBN: (纸本)9781665487399

Predicting outfit compatibility and retrieving complementary items are critical components for a fashion recommendation system. We present a scalable framework, Out-fitTransformer, that learns compatibility of the entire out-fit and supports large-scale complementary item retrieval. We model outfits as an unordered set of items and leverage self-attention mechanism to learn the relationships between items. We train the framework using a proposed set-wise outfit ranking loss to generate a target item embedding given an outfit, and a target item specification. The generated target item embedding is then used to retrieve compatible items that match the outfit. Experimental results demonstrate that our approach outperforms state-of-the-art methods on compatibility prediction, fill-in-the-blank, and complementary item retrieval tasks.

关键词： computer vision conferences Task analysis Recommender systems pattern matching

来源：评论

学校读者我要写书评

暂无评论

Explaining CLIP's performance disparities on data from blind/low vision users

Explaining CLIP's performance disparities on data from blind...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Massiceti, Daniela Longden, Camilla Slowik, Agnieszka Wills, Samuel Grayson, Martin Morrison, Cecily Microsoft Res Redmond WA 98052 USA World Bank 1818 H St NW Washington DC 20433 USA

ISBN: (纸本)9798350353006

Large multi-modal models (LMMs) hold the potential to usher in a new era of automated visual assistance for people who are blind or low vision (BLV). Yet, these models have not been systematically evaluated on data captured by BLV users. We address this by empirically assessing CLIP, a widely-used LMM likely to underpin many assistive technologies. Testing 25 CLIP variants in a zero-shot classification task, we find that their accuracy is 15 percentage points lower on average for images captured by BLV users than web-crawled images. This disparity stems from CLIP's sensitivities to 1) image content (e.g. not recognizing disability objects as well as other objects);2) image quality (e.g. not being robust to lighting variation);and 3) text content (e.g. not recognizing objects described by tactile adjectives as well as visual ones). We delve deeper with a textual analysis of three common pre-training datasets: LAION-400M, LAION-2B and DataComp-1B, showing that disability content is rarely mentioned. We then provide three examples that illustrate how the performance disparities extend to three downstream models underpinned by CLIP: OWL-ViT, CLIPSeg and DALL-E2. We find that few-shot learning with as few as 5 images can mitigate CLIP's quality-of-service disparities for BLV users in some scenarios, which we discuss alongside a set of other possible mitigations.

关键词： accessibility blind CLIP fairness low vision multi-modal vision-language zero-shot classification

来源：评论

学校读者我要写书评

暂无评论

PracticalDG: Perturbation Distillation on vision-Language Models for Hybrid Domain Generalization

PracticalDG: Perturbation Distillation on Vision-Language Mo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Zining Wang, Weiqiu Zhao, Zhicheng Su, Fei Men, Aidong Meng, Hongying Beijing Univ Posts & Telecommun Sch Artificial Intelligence Beijing Peoples R China Beijing Key Lab Network Syst & Network Culture Beijing Peoples R China Minist Culture & Tourism Key Lab Interact Technol & Experience Syst Beijing Peoples R China Brunel Univ Uxbridge Uxbridge Middx England

ISBN: (纸本)9798350353006

Domain Generalization (DG) aims to resolve distribution shifts between source and target domains, and current DG methods are default to the setting that data from source and target domains share identical categories. Nevertheless, there exists unseen classes from target domains in practical scenarios. To address this issue, Open Set Domain Generalization (OSDG) has emerged and several methods have been exclusively proposed. However, most ex-isting methods adopt complex architectures with slight improvement compared with DG methods. Recently, vision-language models (VLMs) have been introduced in DG following the fine-tuning paradigm, but consume huge training overhead with large vision models. Therefore, in this paper, we innovate to transfer knowledge from VLMs to lightweight vision models and improve the robustness by introducing Perturbation Distillation (PD) from three perspectives, including Score, Class and Instance (SCI), named SCI- PD. Moreover, previous methods are oriented by the benchmarks with identical and fixed splits, ignoring the divergence between source domains. These methods are revealed to suffer from sharp performance decay with our proposed new benchmark Hybrid Domain Generalization (HDG) and a novel metric H-2-CV, which construct various splits to comprehensively assess the robustness of algorithms. Extensive experiments demonstrate that our method outperforms state-of-the-art algorithms on multiple datasets, especially improving the robustness when con-fronting data scarcity.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Gene...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Liu, Zhixuan Schaldenbrand, Peter Okogwu, Beverley-Claire Peng, Wenxuan Yun, Youngsik Hundt, Andrew Kim, Jihie Oh, Jean Carnegie Mellon Univ Pittsburgh PA 15213 USA Nanyang Technol Univ Singapore Singapore Dongguk Univ Seoul South Korea

ISBN: (纸本)9798350353006

Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset that we call the Cross-Cultural Under-standing Benchmark (CCUB) and (2) proposing a novel Self- Contrastive Fine-Tuning (SCoFT, pronounced /soft/) method that leverages the model's known biases to self-improve. SCoFT is designed to prevent overfitting on small datasets, encode only high-level information from the data, and shift the generated distribution away from misrepresentations encoded in a pretrained model. Our user study conducted on 51 participants from 5 different countries based on their self-selected national cultural affiliation shows that fine-tuning on CCUB consistently generates images with higher cultural relevance and fewer stereotypes when compared to the Stable Diffusion baseline, which is further improved with our SCoFT technique. Resources and code are at https://***/SCoFT.

关键词： computer vision for Social Good Image Synthesis

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 40 41 42 43 44 45 46 47 48 49 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：