检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

50,636 篇 会议
1,423 册 图书
1,044 篇 期刊文献
1 篇 学位论文

馆藏范围

53,101 篇 电子文献
3 种 纸本馆藏

日期分布

学科分类号

31,927 篇 工学
- 24,897 篇 计算机科学与技术...
- 12,629 篇 软件工程
- 5,176 篇 光学工程
- 4,760 篇 电气工程
- 4,463 篇 信息与通信工程
- 4,261 篇 机械工程
- 3,980 篇 控制科学与工程
- 2,477 篇 生物工程
- 1,736 篇 生物医学工程（可授...
- 1,583 篇 仪器科学与技术
- 1,314 篇 电子科学与技术（可...
- 795 篇 化学工程与技术
- 715 篇 安全科学与工程
- 560 篇 交通运输工程
- 383 篇 建筑学
- 335 篇 土木工程
11,899 篇 理学
- 6,481 篇 物理学
- 5,426 篇 数学
- 2,765 篇 生物学
- 1,915 篇 统计学（可授理学、...
- 804 篇 化学
- 669 篇 系统科学
5,313 篇 医学
- 5,103 篇 临床医学
- 731 篇 基础医学(可授医学...
- 459 篇 药学(可授医学、理...
3,369 篇 管理学
- 1,964 篇 图书情报与档案管...
- 1,554 篇 管理科学与工程(可...
- 485 篇 工商管理
720 篇 艺术学
- 718 篇 设计学（可授艺术学...
434 篇 法学
- 406 篇 社会学
302 篇 农学
198 篇 教育学
166 篇 经济学
63 篇 文学
48 篇 军事学

主题

17,404 篇 computer vision
9,026 篇 pattern recognit...
4,196 篇 training
3,830 篇 feature extracti...
3,134 篇 cameras
2,876 篇 computational mo...
2,794 篇 image segmentati...
2,622 篇 visualization
2,574 篇 shape
2,535 篇 face recognition
2,176 篇 robustness
2,124 篇 computer science
1,975 篇 object detection
1,960 篇 computer archite...
1,882 篇 layout
1,853 篇 object recogniti...
1,801 篇 three-dimensiona...
1,725 篇 neural networks
1,705 篇 humans
1,697 篇 image recognitio...

机构

165 篇 univ chinese aca...
144 篇 tsinghua univers...
135 篇 national laborat...
106 篇 univ sci & techn...
104 篇 zhejiang univers...
101 篇 shanghai jiao to...
95 篇 university of sc...
95 篇 microsoft resear...
85 篇 zhejiang univ pe...
84 篇 shanghai ai lab ...
74 篇 school of comput...
69 篇 computer vision ...
68 篇 peking univ peop...
68 篇 chinese acad sci...
66 篇 chinese univ hon...
63 篇 institute of inf...
62 篇 google res mount...
61 篇 univ oxford oxfo...
59 篇 univ toronto on
57 篇 swiss fed inst t...

作者

92 篇 van gool luc
87 篇 umapada pal
78 篇 zhang lei
64 篇 lee seong-whan
50 篇 vittorio murino
42 篇 yang yi
34 篇 nassir navab
34 篇 ling haibin
33 篇 li xin
33 篇 jie yang
32 篇 liu yang
31 篇 loy chen change
30 篇 escalera sergio
30 篇 h. bischof
29 篇 zhou jie
29 篇 vasconcelos nuno
29 篇 jan-michael frah...
28 篇 blumenstein mich...
27 篇 jia yunde
27 篇 luo ping

语言

50,122 篇 英文
2,746 篇 其他
252 篇 中文
22 篇 土耳其文
4 篇 西班牙文
2 篇 日文
2 篇 葡萄牙文
2 篇 俄文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition"

共 53104 条记录，以下是291-300 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Towards Understanding and Improving Adversarial Robustness of vision Transformers

Towards Understanding and Improving Adversarial Robustness o...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jain, Samyak Dutta, Tanima Indian Inst Technol BHU Varanasi Varanasi Uttar Pradesh India

ISBN: (纸本)9798350353006

Recent literature has demonstrated that vision transformers (VITs) exhibit superior performance compared to convolutional neural networks (CNNs). The majority of recent research on adversarial robustness, however, has predominantly focused on CNNs. In this work, we bridge this gap by analyzing the effectiveness of existing attacks on VITs. We demonstrate that due to the softmax computations in every attention block in VITs, they are inherently vulnerable to floating point underflow errors. This can lead to a gradient masking effect resulting in suboptimal attack strength of well-known attacks, like PGD, Carlini and Wagner (CW) and GAMA. Motivated by this, we propose Adaptive Attention Scaling (AAS) attack that can automatically find the optimal scaling factors of pre-softmax outputs using gradient-based optimization. We show that the proposed simple strategy can be incorporated with any existing adversarial attacks as well as adversarial training methods and achieved improved performance. On VIT-B16, we demonstrate an improved attack strength of upto 2.2% on CIFAR10 and upto 2.9% on CIFAR100 by incorporating the proposed AAS attack with state-of-the-art single attack methods like GAMA attack. Further, we utilise the proposed AAS attack for every few epochs in existing adversarial training methods, which is termed as Adaptive Attention Scaling Adversarial Training (AAS-AT). On incorporating AAS-AT with existing methods, we outperform them on VITs over 1.3-3.5% on CIFAR10. We observe improved performance on ImageNet-100 as well.

关键词： adversarial robustness vision Transformers

来源：评论

学校读者我要写书评

暂无评论

Multiscale vision Transformers meet Bipartite Matching for efficient single-stage Action Localization

Multiscale Vision Transformers meet Bipartite Matching for e...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ntinou, Ioanna Sanchez, Enrique Tzimiropoulos, Georgios Queen Mary Univ London London England Samsung AI Ctr Cambridge Cambridge England

ISBN: (纸本)9798350353006

Action Localization is a challenging problem that combines detection and recognition tasks, which are often addressed separately. State-of-the-art methods rely on off-the-shelf bounding box detections pre-computed at high resolution, and propose transformer models that focus on the classification task alone. Such two-stage solutions are prohibitive for real-time deployment. On the other hand, single-stage methods target both tasks by devoting part of the network (generally the backbone) to sharing the majority of the workload, compromising performance for speed. These methods build on adding a DETR head with learnable queries that after cross- and self-attention can be sent to corresponding MLPs for detecting a person's bounding box and action. However, DETR-like architectures are challenging to train and can incur in big complexity. In this paper, we observe that a straight bipartite matching loss can be applied to the output tokens of a vision transformer. This results in a backbone + MLP architecture that can do both tasks without the need of an extra encoder-decoder head and learnable queries. We show that a single MViTv2-S architecture trained with bipartite matching to perform both tasks surpasses the same MViTv2-S when trained with RoI align on pre-computed bounding boxes. With a careful design of token pooling and the proposed training pipeline, our Bipartite-Matching vision Transformer model, BMViT, achieves +3 mAP on AVA2.2. w.r.t. the two-stage MViTv2-S counterpart. Code is available at https://***/IoannaNti/BMViT

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

Robust Image Denoising through Adversarial Frequency Mixup

Robust Image Denoising through Adversarial Frequency Mixup

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ryou, Donghun Ha, Inju Yoo, Hyewon Kim, Dongwan Han, Bohyung Seoul Natl Univ ECE Comp Vis Lab Seoul South Korea Seoul Natl Univ IPAI Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

Image denoising approaches based on deep neural networks often struggle with overfitting to specific noise distributions present in training data. This challenge persists in existing real-world denoising networks, which are trained using a limited spectrum of real noise distributions, and thus, show poor robustness to out-of-distribution real noise types. To alleviate this issue, we develop a novel training framework called Adversarial Frequency Mixup (AFM). AFM leverages mixup in the frequency domain to generate noisy images with distinctive and challenging noise characteristics, all the while preserving the properties of authentic real-world noise. Subsequently, incorporating these noisy images into the training pipeline enhances the denoising network's robustness to variations in noise distributions. Extensive experiments and analyses, conducted on a wide range of real noise benchmarks demonstrate that denoising networks trained with our proposed framework exhibit significant improvements in robustness to unseen noise distributions. The code is available at https://***/dhryougit/AFM.

关键词： Image Denoising Low-level vision Robustness

来源：评论

学校读者我要写书评

暂无评论

ViTamin: Designing Scalable vision Models in the vision-Language Era

ViTamin: Designing Scalable Vision Models in the Vision-Lang...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Jieneng Yu, Qihang Shen, Xiaohui Yuille, Alan Chen, Liang-Chieh Johns Hopkins Univ Baltimore MD 21218 USA ByteDance Beijing Peoples R China

ISBN: (纸本)9798350353006

Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large-scale Internet image-text pairs. However, despite the amazing achievement from the VLMs, vanilla vision Transformers (ViTs) remain the default choice for the image encoder. Although pure transformer proves its effectiveness in the text encoding area, it remains questionable whether it is also the case for image encoding, especially considering that various types of networks are proposed on the ImageNet benchmark, which, unfortunately, are rarely studied in VLMs. Due to small data/model scale, the original conclusions of model design on ImageNet can be limited and biased. In this paper, we aim at building an evaluation protocol of vision models in the vision-language era under the contrastive language-image pretraining (CLIP) framework. We provide a comprehensive way to benchmark different vision models, covering their zero-shot performance and scalability in both model and training data sizes. To this end, we introduce ViTamin, a new vision models tailored for VLMs. ViTamin-L significantly outperforms ViT-L by 2.0% ImageNet zero-shot accuracy, when using the same publicly available DataComp-1B dataset and the same OpenCLIP training scheme. ViTamin-L presents promising results on 60 diverse benchmarks, including classification, retrieval, open-vocabulary detection and segmentation, and large multi-modal models. When further scaling up the model size, our ViTamin-XL with only 436M parameters attains 82.9% ImageNet zero-shot accuracy, surpassing 82.0% achieved by EVA-E that has ten times more parameters (4.4B).

关键词： Architectural Design vision-Language Models

来源：评论

学校读者我要写书评

暂无评论

SpatialVLM: Endowing vision-Language Models with Spatial Reasoning Capabilities

SpatialVLM: Endowing Vision-Language Models with Spatial Rea...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Boyuan Xu, Zhuo Kirman, Sean Ichter, Brian Sadigh, Dorsa Guibas, Leonidas Xia, Fei Google DeepMind London England Google Res Mountain View CA USA MIT 77 Massachusetts Ave Cambridge MA 02139 USA

ISBN: (纸本)9798350353006

Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size difference. We hypothesize that VLMs' limited spatial reasoning capability is due to the lack of 3D spatial knowledge in training data and aim to solve this problem by training VLMs with Internet-scale spatial reasoning data. To this end, we present a system to facilitate this approach. We first develop an automatic 3D spatial VQA data generation framework that scales up to 2 billion VQA examples on 10 million real-world images. We then investigate various factors in training recipe including data quality, training pipeline and VLM architecture. Our work features the first Internet-scale 3D spatial reasoning dataset in metric space. By training a VLM on such data, we significantly enhance its ability on both qualitative and quantitative spatial VQA. Finally, we demonstrate that this VLM unlocks novel downstream applications in chain-of-thought spatial reasoning and robotics due to its quantitative estimation capability. Website: https://***/

关键词： large language model multimodal spatial reasoning vision language model

来源：评论

学校读者我要写书评

暂无评论

CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor

CLIP as RNN: Segment Countless Visual Concepts without Train...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Sun, Shuyang Li, Runjia Torr, Philip Gu, Xiuye Li, Siyang Univ Oxford Oxford England Google Res Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask labels and/or image-text datasets. Mask labels are labor-intensive, which limits the number of categories in segmentation datasets. Consequently, the vocabulary capacity of pre-trained VLMs is severely reduced after fine-tuning. However, without fine-tuning, VLMs trained under weak image-text supervision tend to make suboptimal mask predictions. To alleviate these issues, we introduce a novel recurrent framework that progressively filters out irrelevant texts and enhances mask quality without training efforts. The recurrent unit is a two-stage segmenter built upon a frozen VLM. Thus, our model retains the VLM's broad vocabulary space and equips it with segmentation ability. Experiments show that our method outperforms not only the training-free counterparts, but also those fine-tuned with millions of data samples, and sets the new state-of-the-art records for both zero-shot semantic and referring segmentation. Concretely, we improve the current record by 28.8, 16.0, and 6.9 mIoU on Pascal VOC, COCO Object, and Pascal Context.

关键词： image segmentation open-vocabulary referring segmentation training-free methods vision-language models

来源：评论

学校读者我要写书评

暂无评论

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large vision-Language Models

THRONE: An Object-based Hallucination Benchmark for the Free...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kaul, Prannay Li, Zhizhong Yang, Hao Dukler, Yonatan Swaminathan, Ashwin Taylor, C. J. Soatto, Stefano Univ Oxford VGG Oxford England AWS AI Labs Oxford England

ISBN: (纸本)9798350353006

Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question formats-typically a multiple-choice response regarding a particular object or attribute-which we term "Type II hallucinations". Additionally, such benchmarks often require external API calls to models which are subject to change. In practice, we observe that a reduction in Type II hallucinations does not lead to a reduction in Type I hallucinations but rather that the two forms of hallucinations are often anti-correlated. To address this, we pro-pose THRONE, a novel object-based automatic framework for quantitatively evaluating Type I hallucinations in LVLM free-form outputs. We use public language models (LMs) to identify hallucinations in LVLM responses and compute informative metrics. By evaluating a large selection of recent LVLMs using public datasets, we show that an improvement in existing metrics do not lead to a reduction in Type I hallucinations, and that established benchmarks for measuring Type I hallucinations are incomplete. Finally, we provide a simple and effective data augmentation method to reduce Type I and Type II hallucinations as a strong baseline.

关键词： benchmark hallucination large language model large vision-language model LLM LVLM

来源：评论

学校读者我要写书评

暂无评论

TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process

TIGER: Time-Varying Denoising Model for 3D Point Cloud Gener...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ren, Zhiyuan Kim, Minchul Liu, Feng Liu, Xiaoming Michigan State Univ E Lansing MI 48824 USA

ISBN: (纸本)9798350353006

Recently, diffusion models have emerged as a new powerful generative method for 3D point cloud generation tasks. However, few works study the effect of the architecture of the diffusion model in the 3D point cloud, resorting to the typical UNet model developed for 2D images. Inspired by the wide adoption of Transformers, we study the complementary role of convolution (from UNet) and attention (from Transformers). We discover that their respective importance change according to the timestep in the diffusion process. At early stage, attention has an out-sized influence because Transformers are found to generate the overall shape more quickly, and at later stages when adding fine detail, convolution starts having a larger impact on the generated point cloud's local surface quality. In light of this observation, we propose a time-varying two-stream denoising model combined with convolution layers and transformer blocks. We generate an optimizable mask from each timestep to reweigh global and local features, obtaining time-varying fused features. Experimentally, we demonstrate that our proposed method quantitatively outperforms other state-of-the-art methods regarding visual quality and diversity. Code is avaiable https://***/Zhiyuan-R/Tiger-Diffusion.

关键词： 3D vision Diffusion Model Generative Model Point Cloud ShapeNet

来源：评论

学校读者我要写书评

暂无评论

DePT: Decoupled Prompt Tuning

DePT: Decoupled Prompt Tuning

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Ji Wu, Shihan Gao, Lianli Shen, Heng Tao Song, Jingkuan Univ Elect Sci & Technol China UESTC Chengdu Peoples R China UESTC Shenzhen Inst Adv Study Chengdu Peoples R China Tongji Univ Shanghai Peoples R China

ISBN: (纸本)9798350353006

This work breaks through the Base-New Tradeoff (BNT) dilemma in prompt tuning, i.e., the better the tuned model generalizes to the base (or target) task, the worse it generalizes to new tasks, and vice versa. Specifically, through an in-depth analysis of the learned features of the base and new tasks, we observe that the BNT stems from a channel bias issue - the vast majority of feature channels are occupied by base-specific knowledge, leading to the collapse of task-shared knowledge important to new tasks. To address this, we propose the Decoupled Prompt Tuning (DePT) framework, which decouples base-specific knowledge from feature channels into an isolated feature space during prompt tuning, so as to maximally preserve task-shared knowl-edge in the original feature space for achieving better zero-shot generalization on new tasks. Importantly, our DePT is orthogonal to existing prompt tuning approaches, and can enhance them with negligible additional computational cost. Extensive experiments on several datasets show the flexibility and effectiveness of DePT. Code is available at https://***/Koorye/DePT.

关键词： Feature decoupling Few-shot learning Prompt tuning vision and language

来源：评论

学校读者我要写书评

暂无评论

Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing

Attentive Illumination Decomposition Model for Multi-Illumin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Dongyoung Kim, Jinwoo Yu, Junsang Kim, Seon Joo Yonsei Univ Seoul South Korea Samsung Adv Inst Technol Suwon South Korea

ISBN: (纸本)9798350353006

White balance (WB) algorithms in many commercial cameras assume single and uniform illumination, leading to undesirable results when multiple lighting sources with different chromaticities exist in the scene. Prior research on multi-illuminant WB typically predicts illumination at the pixel level without fully grasping the scene's actual lighting conditions, including the number and color of light sources. This often results in unnatural outcomes lacking in overall consistency. To handle this problem, we present a deep white balancing model that leverages the slot attention, where each slot is in charge of representing individual illuminants. This design enables the model to generate [ chromaticities and weight maps for individual illuminants, which are then fused to compose the final illumination map. Furthermore, we propose the centroid-matching loss, which regulates the activation of each slot based on the color range, thereby enhancing the model to separate illumination more effectively. Our method achieves the state-of-the-art performance on both single- and multi-illuminant WB benchmarks, and also offers additional information such as the number of illuminants in the scene and their chromaticity. This capability allows for illumination editing, an application not feasible with prior methods.

关键词： Low level vision Photography White Balancing

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 26 27 28 29 30 31 32 33 34 35 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：