检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

22,771 篇 会议
112 篇 期刊文献
23 册 图书

馆藏范围

22,905 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,398 篇 工学
- 10,880 篇 计算机科学与技术...
- 3,450 篇 软件工程
- 2,430 篇 机械工程
- 1,721 篇 光学工程
- 1,010 篇 控制科学与工程
- 998 篇 电气工程
- 761 篇 信息与通信工程
- 393 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 215 篇 电子科学与技术（可...
- 113 篇 化学工程与技术
- 112 篇 安全科学与工程
- 98 篇 测绘科学与技术
- 92 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,362 篇 医学
- 3,348 篇 临床医学
- 79 篇 基础医学(可授医学...
3,250 篇 理学
- 1,953 篇 物理学
- 1,664 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 109 篇 化学
506 篇 管理学
- 299 篇 图书情报与档案管...
- 219 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
62 篇 法学
- 59 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
3 篇 文学

主题

10,126 篇 computer vision
4,025 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,792 篇 cameras
1,758 篇 visualization
1,485 篇 shape
1,466 篇 image segmentati...
1,447 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,169 篇 computer archite...
1,144 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
64 篇 tsinghua univ pe...
63 篇 adobe research
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,844 篇 英文
35 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22906 条记录，以下是311-320 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly

DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Scarpellini, Gianluca Fiorini, Stefano Giuliari, Francesco Morerio, Pietro Del Bue, Alessio Ist Italiano Tecnol IIT Pattern Anal & Comp Vis PAVIS Genoa Italy

ISBN: (纸本)9798350353006

Reassembly tasks play a fundamental role in many fields and multiple approaches exist to solve specific reassembly problems. In this context, we posit that a general unified model can effectively address them all, irrespective of the input data type (images, 3D, etc.). We introduce DiffAssemble, a Graph Neural Network (GNN)-based architecture that learns to solve reassembly tasks using a diffusion model formulation. Our method treats the elements of a set, whether pieces of 2D patch or 3D object fragments, as nodes of a spatial graph. Training is performed by introducing noise into the position and rotation of the elements and iteratively denoising them to reconstruct the coherent initial pose. DiffAssemble achieves state-of-the-art (SOTA) results in most 2D and 3D reassembly tasks and is the first learning-based approach that solves 2D puzzles for both rotation and translation. Furthermore, we highlight its remarkable reduction in run-time, performing 11 times faster than the quickest optimization-based method for puzzle solving. Code available at https:// ***/IITPAVIS/DiffAssemble

关键词： diffusion model graph neural network puzzle reassembly

来源：评论

学校读者我要写书评

暂无评论

Grounded Question-Answering in Long Egocentric Videos

Grounded Question-Answering in Long Egocentric Videos

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Di, Shangzhe Xie, Weidi Shanghai Jiao Tong Univ CMIC Shanghai Peoples R China Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

Existing approaches to video understanding, mainly designed for short videos from a third-person perspective, are limited in their applicability in certain fields, such as robotics. In this paper, we delve into open-ended question-answering (QA) in long, egocentric videos, which allows individuals or robots to inquire about their own past visual experiences. This task presents unique challenges, including where did I put lettuce? Choices: (A) pantry (B) refrigerator (C) cupboard (D) draw 20-50s Answer: in the fridge / (B) refrigerator the complexity of temporally grounding queries within extensive video content, the high resource demands for precise data annotation, and the inherent difficulty of evaluating open-ended answers due to their ambiguous nature. Our proposed approach tackles these challenges by (i) integrating query grounding and answering within a unified model to reduce error propagation;(ii) employing large language models for efficient and scalable data synthesis;and (iii) introducing a close-ended QA task for evaluation, to manage answer ambiguity. Extensive experiments demonstrate the effectiveness of our method, which also achieves state-of-the-art performance on the QAEgo4D and Ego4D-NLQ benchmarks. Code, data, and models are open-sourced (1).

关键词： egocentric vision video grounding video question answering

来源：评论

学校读者我要写书评

暂无评论

Material Palette: Extraction of Materials from a Single Image

Material Palette: Extraction of Materials from a Single Imag...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Lopes, Ivan Pizzati, Fabio de Charette, Raoul INRIA Paris France Univ Oxford Oxford England

ISBN: (纸本)9798350353013;9798350353006

Physically-Based Rendering (PBR) is key to modeling the interaction between light and materials, and finds extensive applications across computer graphics domains. However, acquiring PBR materials is costly and requires special apparatus. In this paper, we propose a method to extract PBR materials from a single real-world image. We do so in two steps: first, we map regions of the image to material concept tokens using a diffusion model, allowing the sampling of texture images resembling each material in the scene. Second, we leverage a separate network to decompose the generated textures into spatially varying BRDFs (SVBRDFs), offering us readily usable materials for rendering applications. Our approach relies on existing synthetic material libraries with SVBRDF ground truth. It exploits a diffusion-generated RGB texture dataset to allow generalization to new samples using unsupervised domain adaptation (UDA). Our contributions are thoroughly evaluated on synthetic and real-world datasets. We further demonstrate the applicability of our method for editing 3D scenes with materials estimated from real photographs. Along with video, we share code and models as open-source on the project page: https://***/astra-vision/MaterialPalette

关键词： brdf computer graphics generative ai inversion Material estimation

来源：评论

学校读者我要写书评

暂无评论

RMem: Restricted Memory Banks Improve Video Object Segmentation

RMem: Restricted Memory Banks Improve Video Object Segmentat...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhou, Junbao Pang, Ziqi Wang, Yu-Xiong Univ Illinois Champaign IL 61820 USA

ISBN: (纸本)9798350353006

With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a sim-ple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of ex-panding memory banks to accommodate extensive histor-ical information. Our specially designed memory deci-phering study offers a pivotal insight underpinning such a strategy: expanding memory banks, while seemingly bene-ficial, actually increases the difficulty for VOS modules to decode relevant features due to the confusion from redun-dant information. By restricting memory banks to a limited number of essential frames, we achieve a notable improvement in VOS accuracy. This process balances the im-portance and freshness of frames to maintain an informative memory bank within a bounded capacity. Additionally, restricted memory banks reduce the training-inference discrepancy in memory lengths compared with continuous expansion. This fosters new opportunities in temporal reasoning and enables us to introduce the previously overlooked temporal positional embedding. Finally, our insights are embodied in RMem (R for restricted), a simple yet effective VOS modification that excels at challenging VOS scenarios and establishes new state of the art for object state changes (on the VOST dataset) and long videos (on the Long Videos dataset). Our code and demos are available at https://***/.

关键词： egocentric vision embodied ai video object segmentation video understanding

来源：评论

学校读者我要写书评

暂无评论

Towards Understanding and Improving Adversarial Robustness of vision Transformers

Towards Understanding and Improving Adversarial Robustness o...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jain, Samyak Dutta, Tanima Indian Inst Technol BHU Varanasi Varanasi Uttar Pradesh India

ISBN: (纸本)9798350353006

Recent literature has demonstrated that vision transformers (VITs) exhibit superior performance compared to convolutional neural networks (CNNs). The majority of recent research on adversarial robustness, however, has predominantly focused on CNNs. In this work, we bridge this gap by analyzing the effectiveness of existing attacks on VITs. We demonstrate that due to the softmax computations in every attention block in VITs, they are inherently vulnerable to floating point underflow errors. This can lead to a gradient masking effect resulting in suboptimal attack strength of well-known attacks, like PGD, Carlini and Wagner (CW) and GAMA. Motivated by this, we propose Adaptive Attention Scaling (AAS) attack that can automatically find the optimal scaling factors of pre-softmax outputs using gradient-based optimization. We show that the proposed simple strategy can be incorporated with any existing adversarial attacks as well as adversarial training methods and achieved improved performance. On VIT-B16, we demonstrate an improved attack strength of upto 2.2% on CIFAR10 and upto 2.9% on CIFAR100 by incorporating the proposed AAS attack with state-of-the-art single attack methods like GAMA attack. Further, we utilise the proposed AAS attack for every few epochs in existing adversarial training methods, which is termed as Adaptive Attention Scaling Adversarial Training (AAS-AT). On incorporating AAS-AT with existing methods, we outperform them on VITs over 1.3-3.5% on CIFAR10. We observe improved performance on ImageNet-100 as well.

关键词： adversarial robustness vision Transformers

来源：评论

学校读者我要写书评

暂无评论

Learning unbiased classifiers from biased data with meta-learning

Learning unbiased classifiers from biased data with meta-lea...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ragonesi, Ruggero Morerio, Pietro Murino, Vittorio Ist Italiano Tecnol Pattern Anal & Comp Vis PAVIS Genoa Italy Univ Verona Dept Comp Sci Verona Italy

ISBN: (纸本)9798350302493

It is well known that large deep architectures are powerful models when adequately trained, but may exhibit undesirable behavior leading to confident incorrect predictions, even when evaluated on slightly different test examples. Test data characterized by distribution shifts (from training data distribution), outliers, and adversarial samples are among the types of data affected by this problem. This situation worsens whenever data are biased, meaning that predictions are mostly based on spurious correlations present in the data. Unfortunately, since such correlations occur in the most of data, a model is prevented from correctly generalizing the considered classes. In this work, we tackle this problem from a meta-learning perspective. Considering the dataset as composed of unknown biased and unbiased samples, we first identify these two subsets by a pseudo-labeling algorithm, even if coarsely. Subsequently, we apply a bi-level optimization algorithm in which, in the inner loop, we look for the best parameters guiding the training of the two subsets, while in the outer loop, we train the final model taking benefit from augmented data generated using Mixup. Properly tuning the contributions of biased and unbiased data, together with the regularization introduced by the mixed data has proved to be an effective training strategy to learn unbiased models, showing superior generalization capabilities. Experimental results on synthetically and realistically biased datasets surpass state-of-the-art performance, as compared to existing methods.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

SpatialVLM: Endowing vision-Language Models with Spatial Reasoning Capabilities

SpatialVLM: Endowing Vision-Language Models with Spatial Rea...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chen, Boyuan Xu, Zhuo Kirman, Sean Ichter, Brian Sadigh, Dorsa Guibas, Leonidas Xia, Fei Google DeepMind London England Google Res Mountain View CA USA MIT 77 Massachusetts Ave Cambridge MA 02139 USA

ISBN: (纸本)9798350353006

Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size difference. We hypothesize that VLMs' limited spatial reasoning capability is due to the lack of 3D spatial knowledge in training data and aim to solve this problem by training VLMs with Internet-scale spatial reasoning data. To this end, we present a system to facilitate this approach. We first develop an automatic 3D spatial VQA data generation framework that scales up to 2 billion VQA examples on 10 million real-world images. We then investigate various factors in training recipe including data quality, training pipeline and VLM architecture. Our work features the first Internet-scale 3D spatial reasoning dataset in metric space. By training a VLM on such data, we significantly enhance its ability on both qualitative and quantitative spatial VQA. Finally, we demonstrate that this VLM unlocks novel downstream applications in chain-of-thought spatial reasoning and robotics due to its quantitative estimation capability. Website: https://***/

关键词： large language model multimodal spatial reasoning vision language model

来源：评论

学校读者我要写书评

暂无评论

THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large vision-Language Models

THRONE: An Object-based Hallucination Benchmark for the Free...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kaul, Prannay Li, Zhizhong Yang, Hao Dukler, Yonatan Swaminathan, Ashwin Taylor, C. J. Soatto, Stefano Univ Oxford VGG Oxford England AWS AI Labs Oxford England

ISBN: (纸本)9798350353006

Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question formats-typically a multiple-choice response regarding a particular object or attribute-which we term "Type II hallucinations". Additionally, such benchmarks often require external API calls to models which are subject to change. In practice, we observe that a reduction in Type II hallucinations does not lead to a reduction in Type I hallucinations but rather that the two forms of hallucinations are often anti-correlated. To address this, we pro-pose THRONE, a novel object-based automatic framework for quantitatively evaluating Type I hallucinations in LVLM free-form outputs. We use public language models (LMs) to identify hallucinations in LVLM responses and compute informative metrics. By evaluating a large selection of recent LVLMs using public datasets, we show that an improvement in existing metrics do not lead to a reduction in Type I hallucinations, and that established benchmarks for measuring Type I hallucinations are incomplete. Finally, we provide a simple and effective data augmentation method to reduce Type I and Type II hallucinations as a strong baseline.

关键词： benchmark hallucination large language model large vision-language model LLM LVLM

来源：评论

学校读者我要写书评

暂无评论

Random Entangled Tokens for Adversarially Robust vision Transformer

Random Entangled Tokens for Adversarially Robust Vision Tran...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Gong, Huihui Dong, Mingjing Mao, Siqi Camtepe, Seyit Nepal, Surya Xu, Chang Univ Sydney Sydney NSW Australia CSIRO Data61 Eveleigh Australia City Univ Hong Kong Hong Kong Peoples R China Univ New South Wales Sydney NSW Australia

ISBN: (纸本)9798350353006

vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks ( CNNs) in the realm of computer vision, showcasing tremendous potential. However, recent research has unveiled a susceptibility of ViTs to adversarial attacks, akin to their CNN counterparts. Adversarial training and randomization are two representative effective defenses for CNNs. Some researchers have attempted to apply adversarial training to ViTs and achieved comparable robustness to CNNs, while it is not easy to directly apply randomization to ViTs because of the architecture difference between CNNs and ViTs. In this paper, we delve into the structural intricacies of ViTs and propose a novel defense mechanism termed Random entangled image Transformer (ReiT), which seamlessly integrates adversarial training and randomization to bolster the adversarial robustness of ViTs. Recognizing the challenge posed by the structural disparities between ViTs and CNNs, we introduce a novel module, input-independent random entangled self-attention (II-ReSA). This module optimizes random entangled tokens that lead to "dissimilar" self-attention outputs by leveraging model parameters and the sampled random tokens, thereby synthesizing the self-attention module outputs and random entangled tokens to diminish adversarial similarity. ReiT incorporates two distinct random entangled tokens and employs dual randomization, offering an effective countermeasure against adversarial examples while ensuring comprehensive deduction guarantees. Through extensive experiments conducted on various ViT variants and benchmarks, we substantiate the superiority of our proposed method in enhancing the adversarial robustness of vision Transformers.

关键词： Adversarial Robustness Randomized Defence Self-Attention Mechanism vision Transformers

来源：评论

学校读者我要写书评

暂无评论

Multiscale vision Transformers meet Bipartite Matching for efficient single-stage Action Localization

Multiscale Vision Transformers meet Bipartite Matching for e...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ntinou, Ioanna Sanchez, Enrique Tzimiropoulos, Georgios Queen Mary Univ London London England Samsung AI Ctr Cambridge Cambridge England

ISBN: (纸本)9798350353006

Action Localization is a challenging problem that combines detection and recognition tasks, which are often addressed separately. State-of-the-art methods rely on off-the-shelf bounding box detections pre-computed at high resolution, and propose transformer models that focus on the classification task alone. Such two-stage solutions are prohibitive for real-time deployment. On the other hand, single-stage methods target both tasks by devoting part of the network (generally the backbone) to sharing the majority of the workload, compromising performance for speed. These methods build on adding a DETR head with learnable queries that after cross- and self-attention can be sent to corresponding MLPs for detecting a person's bounding box and action. However, DETR-like architectures are challenging to train and can incur in big complexity. In this paper, we observe that a straight bipartite matching loss can be applied to the output tokens of a vision transformer. This results in a backbone + MLP architecture that can do both tasks without the need of an extra encoder-decoder head and learnable queries. We show that a single MViTv2-S architecture trained with bipartite matching to perform both tasks surpasses the same MViTv2-S when trained with RoI align on pre-computed bounding boxes. With a careful design of token pooling and the proposed training pipeline, our Bipartite-Matching vision Transformer model, BMViT, achieves +3 mAP on AVA2.2. w.r.t. the two-stage MViTv2-S counterpart. Code is available at https://***/IoannaNti/BMViT

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 28 29 30 31 32 33 34 35 36 37 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：