检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

4,477 篇 会议
9 篇 期刊文献
5 册 图书

馆藏范围

4,491 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,329 篇 工学
- 1,912 篇 计算机科学与技术...
- 541 篇 软件工程
- 417 篇 机械工程
- 327 篇 光学工程
- 269 篇 控制科学与工程
- 216 篇 仪器科学与技术
- 117 篇 信息与通信工程
- 99 篇 电气工程
- 79 篇 生物工程
- 50 篇 生物医学工程（可授...
- 34 篇 电子科学与技术（可...
- 25 篇 安全科学与工程
- 21 篇 化学工程与技术
- 16 篇 建筑学
- 15 篇 交通运输工程
- 14 篇 土木工程
489 篇 理学
- 327 篇 物理学
- 194 篇 数学
- 83 篇 生物学
- 79 篇 统计学（可授理学、...
- 23 篇 系统科学
- 18 篇 化学
206 篇 艺术学
- 206 篇 设计学（可授艺术学...
67 篇 管理学
- 48 篇 图书情报与档案管...
- 19 篇 管理科学与工程(可...
- 10 篇 工商管理
45 篇 医学
- 45 篇 临床医学
- 13 篇 基础医学(可授医学...
- 11 篇 药学(可授医学、理...
20 篇 法学
- 18 篇 社会学
7 篇 农学
4 篇 教育学
1 篇 经济学
1 篇 文学
1 篇 军事学

主题

1,834 篇 computer vision
890 篇 conferences
696 篇 pattern recognit...
656 篇 training
472 篇 cameras
381 篇 feature extracti...
375 篇 computational mo...
341 篇 visualization
314 篇 computer archite...
285 篇 image segmentati...
259 篇 face recognition
231 篇 object detection
230 篇 robustness
208 篇 shape
193 篇 three-dimensiona...
184 篇 humans
176 篇 neural networks
169 篇 semantics
166 篇 computer science
157 篇 benchmark testin...

机构

21 篇 swiss fed inst t...
19 篇 swiss fed inst t...
18 篇 university of sc...
17 篇 univ sci & techn...
17 篇 carnegie mellon ...
15 篇 institute for co...
14 篇 tsinghua univers...
13 篇 computer vision ...
13 篇 tsinghua univ pe...
13 篇 stanford univ st...
12 篇 harbin inst tech...
12 篇 mit cambridge ma...
12 篇 sun yat sen univ...
12 篇 carnegie mellon ...
11 篇 chinese univ hon...
11 篇 megvii technol p...
11 篇 chinese acad sci...
10 篇 comp vis ctr bar...
10 篇 univ modena & re...
10 篇 beihang univ peo...

作者

57 篇 timofte radu
20 篇 luc van gool
20 篇 radu timofte
17 篇 horst bischof
16 篇 van gool luc
15 篇 sergio escalera
12 篇 zhigang zhu
12 篇 li stan z.
12 篇 chen wei-ting
12 篇 bischof horst
12 篇 lei lei
11 篇 fan haoqiang
11 篇 sun jian
11 篇 marcos v. conde
11 篇 lei zhen
10 篇 escalera sergio
10 篇 cucchiara rita
10 篇 zhang lei
10 篇 angel d. sappa
10 篇 liu shuaicheng

语言

4,486 篇 英文
4 篇 中文
1 篇 其他

检索条件"任意字段=2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2013"

共 4491 条记录，以下是61-70 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

AIGeN: An Adversarial Approach for Instruction Generation in VLN

AIGeN: An Adversarial Approach for Instruction Generation in...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Rawal, Niyati Bigazzi, Roberto Baraldi, Lorenzo Cucchiara, Rita Univ Modena & Reggio Emilia Modena Italy

ISBN: (纸本)9798350365474

In the last few years, the research interest in vision-and-Language Navigation (VLN) has grown significantly. VLN is a challenging task that involves an agent following human instructions and navigating in a previously unknown environment to reach a specified goal. Recent work in literature focuses on different ways to augment the available datasets of instructions for improving navigation performance by exploiting synthetic training data. In this work, we propose AIGeN, a novel architecture inspired by Generative Adversarial Networks (GANs) that produces meaningful and well-formed synthetic instructions to improve navigation agents' performance. The model is composed of a Transformer decoder (GPT-2) and a Transformer encoder (BERT). During the training phase, the decoder generates sentences for a sequence of images describing the agent's path to a particular point while the encoder discriminates between real and fake instructions. Experimentally, we evaluate the quality of the generated instructions and perform extensive ablation studies. Additionally, we generate synthetic instructions for 217K trajectories using AIGeN on Habitat-Matterport 3D Dataset (HM3D) and show an improvement in the performance of an off-the-shelf VLN method. The validation analysis of our proposal is conducted on REVERIE and R2R and highlights the promising aspects of our proposal, achieving state-of-the-art performance.

关键词： Generative Adversarial Networks Text Generation vision-and-Language Navigation

来源：评论

学校读者我要写书评

暂无评论

Potential Risk Localization via Weak Labeling out of Blind Spot

Potential Risk Localization via Weak Labeling out of Blind S...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shimomura, Kota Hirakawa, Tsubasa Yamashita, Takayoshi Fujiyoshi, Hironobu Chubu Univ Kasugai Aichi Japan

ISBN: (纸本)9798350365474

Achieving fully autonomous driving requires not only understanding the current surrounding conditions but also predicting how objects that could lead to potential risks may change in the future. Predicting potential risk regions, especially where pedestrians or vehicles might suddenly appear, is crucial for safe autonomous driving and accident avoidance. Constructing datasets annotated with potential risk regions is costly. Therefore, conventional methods have proposed blind spot estimation using depth maps or segmentation masks through automatic labeling. However, these methods are limited in applicability due to their reliance on camera parameters or point clouds. In this study, we propose a method to automatically generate labels from depth maps and segmentation masks and estimate potential risk regions in 2D. Our automatic labeling algorithm relies solely on images, making it applicable to all onboard camera datasets. To demonstrate the effectiveness of our approach, we define regions where pedestrians or vehicles might emerge from blind spots as potential risk regions and annotate them to create a new dataset extended with potential risk region annotations. Experiments using the Cityscapes Dataset show that weakly training with labels generated by our proposed method achieves equal or superior accuracy compared with supervised training with manually annotated ground truth (GT). Furthermore, experiments using the Mapillary Vistas Dataset and BDD100K Dataset demonstrate the versatility of our approach.

关键词： Advanced Driver-Assistance Systems Autonomous Driving computer vision Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Interpreting COVID Lateral Flow Tests' Results with Foundation Models

Interpreting COVID Lateral Flow Tests' Results with Foundati...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Pandey, Stuti Myers-Dean, Josh Reynolds, Jarek Gurari, Danna Univ Colorado Boulder CO 80309 USA Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9798350365474

Lateral flow tests (LFTs) enable rapid, low-cost testing for health conditions including Covid, pregnancy, HIV, and malaria. Automated readers of LFT results can yield many benefits including empowering blind people to independently learn about their health and accelerating data entry for large-scale monitoring (e.g., for pandemics such as Covid) by using only a single photograph per LFT test. Accordingly, we explore the abilities of modern foundation vision language models (VLMs) in interpreting such tests. To enable this analysis, we first create a new labeled dataset with hierarchical segmentations of each LFT test and its nested test result window. We call this dataset LFT-Grounding. Next, we benchmark eight modern VLMs in zero-shot settings for analyzing these images. We demonstrate that current VLMs frequently fail to correctly identify the type of LFT test, interpret the test results, locate the nested result window of the LFT tests, and recognize LFT tests when they partially obfuscated. To facilitate community-wide progress towards automated LFT reading, we publicly release our dataset at https://***/ lft_grounding_foundation_models/

关键词： Accessibility Foundation vision Language Models Lateral Flow Test Prompt Engineering Zero-Shot

来源：评论

学校读者我要写书评

暂无评论

Exploring the Zero-Shot Capabilities of vision-Language Models for Improving Gaze Following

Exploring the Zero-Shot Capabilities of Vision-Language Mode...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Gupta, Anshul Vuillecard, Pierre Farkhondeh, Arya Odobez, Jean-Marc Idiap Res Inst Martigny Switzerland Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9798350365474

Contextual cues related to a person's pose and interactions with objects and other people in the scene can provide valuable information for gaze following. While existing methods have focused on dedicated cue extraction methods, in this work we investigate the zero-shot capabilities of vision-Language Models (VLMs) for extracting a wide array of contextual cues to improve gaze following performance. We first evaluate various VLMs, prompting strategies, and in-context learning (ICL) techniques for zero-shot cue recognition performance. We then use these insights to extract contextual cues for gaze following, and investigate their impact when incorporated into a state of the art model for the task. Our analysis indicates that BLIP-2 is the overall top performing VLM and that ICL can improve performance. We also observe that VLMs are sensitive to the choice of the text prompt although ensembling over multiple text prompts can provide more robust performance. Additionally, we discover that using the entire image along with an ellipse drawn around the target person is the most effective strategy for visual prompting. For gaze following, incorporating the extracted cues results in better generalization performance, especially when considering a larger set of cues, highlighting the potential of this approach.

关键词： Gaze Following vision-Language Zero-Shot Evaluation

来源：评论

学校读者我要写书评

暂无评论

ELSA: Exploiting Layer-wise N:M Sparsity for vision Transformer Acceleration

ELSA: Exploiting Layer-wise N:M Sparsity for Vision Transfor...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Huang, Ning-Chi Chang, Chi-Chih Lin, Wei-Cheng Taka, Endri Marculescu, Diana Wu, Kai-Chiang Natl Yang Ming Chiao Tung Univ Hsinchu Taiwan Univ Texas Austin Austin TX USA

ISBN: (纸本)9798350365474

N:M sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing N:M sparsity methods compress neural networks with a uniform setting for all layers in a network or heuristically determine the layer-wise configuration by considering the number of parameters in each layer. However, very few methods have been designed for obtaining a layer-wise customized N:M sparse configuration for vision transformers (ViTs), which usually consist of transformer blocks involving the same number of parameters. In this work, to address the challenge of selecting suitable sparse configuration for ViTs on N:M sparsity-supporting accelerators, we propose ELSA, Exploiting Layer-wise N:M Sparsity for ViTs. Considering not only all N:M sparsity levels supported by a given accelerator but also the expected throughput improvement, our methodology can reap the benefits of accelerators supporting mixed sparsity by trading off negligible accuracy loss with both memory usage and inference time reduction for ViT models. For instance, our approach achieves a noteworthy 2.9x reduction in FLOPs to both Swin-B and DeiT-B with only a marginal degradation of accuracy on ImageNet. Our code is publicly available at https://***/ningchihuang/ ELSA.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

QAttn: Efficient GPU Kernels for mixed-precision vision Transformers

QAttn: Efficient GPU Kernels for mixed-precision Vision Tran...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kluska, Piotr Castello, Adrian Scheidegger, Florian Malossi, A. Cristiano I. Quintana-Orti, Enrique S. IBM Res Europe Ruschlikon Switzerland Univ Politecn Valencia Valencia Spain

ISBN: (纸本)9798350365474

vision Transformers have demonstrated outstanding performance in computer vision tasks. Nevertheless, this superior performance for large models comes at the expense of increasing memory usage for storing the parameters and intermediate activations. To accelerate model inference, in this work we develop and evaluate integer and mixed-precision kernels in Triton for the efficient execution of two fundamental building blocks of transformers -linear layer and attention- on graphics processing units (GPUs). On an NVIDIA A100 GPU, our kernel implementations of vision Transformers achieve a throughput speedup of up to 7x compared with reference kernels in PyTorch floating-point single precision (FP32). Additionally, the accuracy for the ViT Large model top-1 drops by less than one percent on the ImageNet1K classification task. We also observe up to 6x increased throughput by applying our kernels to the Segment Anything Model image encoder while keeping the mIOU close to the FP32 reference on the COCO2017 dataset for static and dynamic quantization. Furthermore, our kernels demonstrate improved speed to the TensorRT INT8 linear layer, and we improve the throughput of base FP16 (half precision) Triton attention on average by up to 19 +/- 4.01%. We have open-sourced the QAtnn framework, which is tightly integrated with the PyTorch quantization workflow https://***/IBM/qattn.

关键词： compression instance segmentation object classification quantization vision transformers

来源：评论

学校读者我要写书评

暂无评论

CAGE: Circumplex Affect Guided Expression Inference

CAGE: Circumplex Affect Guided Expression Inference

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wagner, Niklas Maetzler, Felix Vossberg, Samed R. Schneider, Helen Pavlitska, Svetlana Zoellner, J. Marius Karlsruhe Inst Technol KIT Karlsruhe Germany FZI Res Ctr Informat Technol Karlsruhe Germany

ISBN: (纸本)9798350365474

Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: https:// ***/wagner-niklas/CAGE_expression_inference.

关键词： computer vision Expression Inference Transformer

来源：评论

学校读者我要写书评

暂无评论

Knowledge Distillation for Efficient Instance Semantic Segmentation with Transformers

Knowledge Distillation for Efficient Instance Semantic Segme...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Maohui Halstead, Michael McCool, Chris Univ Bonn Bonn Germany Lamarr Inst Machine Learning & Artificial Intelli Dortmund Germany

ISBN: (纸本)9798350365474

Instance-based semantic segmentation provides detailed per-pixel scene understanding information crucial for both computer vision and robotics applications. However, state-of-the-art approaches such as Mask2Former are computationally expensive and reducing this computational burden while maintaining high accuracy remains challenging. Knowledge distillation has been regarded as a potential way to compress neural networks, but to date limited work has explored how to apply this to distill information from the output queries of a model such as Mask2Former. In this paper, we match the output queries of the student and teacher models to enable a query-based knowledge distillation scheme. We independently match the teacher and the student to the groundtruth and use this to define the teacher to student relationship for knowledge distillation. Using this approach we show that it is possible to perform knowledge distillation where the student models can have a lower number of queries and the backbone can be changed from a Transformer architecture to a convolutional neural network architecture. Experiments on two challenging agricultural datasets, sweet pepper (BUP20) and sugar beet (SB20), and Cityscapes demonstrate the efficacy of our approach. Across the three datasets the student models obtain an average absolute performance improvement in AP of 1.8 and 1.9 points for ResNet-50 and Swin-Tiny backbone respectively. To the best of our knowledge, this is the first work to propose knowledge distillation schemes for instance semantic segmentation with transformer-based models.

关键词： computer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformercomputer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformer

来源：评论

学校读者我要写书评

暂无评论

Exploring the Benefits of vision Foundation Models for Unsupervised Domain Adaptation

Exploring the Benefits of Vision Foundation Models for Unsup...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Englert, Bruno B. Piva, Fabrizio J. Kerssies, Tommie de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350365474

Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of vision Foundation Models (VFMs) and Unsupervised Domain Adaptation (UDA) methods for the semantic segmentation task are complementary. Results show that combining VFMs with UDA has two main benefits: (a) it allows for better UDA performance while maintaining the out-of-distribution performance of VFMs, and (b) it makes certain time-consuming UDA components redundant, thus enabling significant inference speedups. Specifically, with equivalent model sizes, the resulting VFM-UDA method achieves an 8.4x speed increase over the prior non-VFM state of the art, while also improving performance by +1.2 mIoU in the UDA setting and by +6.1 mIoU in terms of out-of-distribution generalization. Moreover, when we use a VFM with 3.6x more parameters, the VFM-UDA approach maintains a 3.3x speed up, while improving the UDA performance by +3.1 mIoU and the out-of-distribution performance by +10.3 mIoU. These results underscore the significant benefits of combining VFMs with UDA, setting new standards and baselines for Unsupervised Domain Adaptation in semantic segmentation. The implementation is available at https://***/tue-mps/vfmuda.

关键词： foundation model generalization semantic segmentation unsupervised domain adaptation vision foundation model

来源：评论

学校读者我要写书评

暂无评论

Segment Anything Model for Road Network Graph Extraction

Segment Anything Model for Road Network Graph Extraction

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hetang, Congrui Xue, Haoru Le, Cindy Yue, Tianwei Wang, Wenping He, Yihui Carnegie Mellon Univ Pittsburgh PA 15213 USA Columbia Univ New York NY USA

ISBN: (纸本)9798350365474

We propose SAM-Road, an adaptation of the Segment Anything Model (SAM) [27] for extracting large-scale, vectorized road network graphs from satellite imagery. To predict graph geometry, we formulate it as a dense semantic segmentation task, leveraging the inherent strengths of SAM. The image encoder of SAM is fine-tuned to produce probability masks for roads and intersections, from which the graph vertices are extracted via simple non-maximum suppression. To predict graph topology, we designed a lightweight transformer-based graph neural network, which leverages the SAM image embeddings to estimate the edge existence probabilities between vertices. Our approach directly predicts the graph vertices and edges for large regions without expensive and complex post-processing heuristics and is capable of building complete road network graphs spanning multiple square kilometers in a matter of seconds. With its simple, straightforward, and minimalist design, SAM-Road achieves comparable accuracy with the state-of-the-art method RNGDet++[57], while being 40 times faster on the City-scale dataset. We thus demonstrate the power of a foundational vision model when applied to a graph learning task. The code is available at https://***/htcr/sam_road.

关键词： autonomous driving computer vision foundation model graph graph neural network mapping navigation remote sensing segment anything semantic segmentation transformer

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共450页 << < 3 4 5 6 7 8 9 10 11 12 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：