检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

3,915 篇 会议
2 篇 期刊文献

馆藏范围

3,917 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

3,011 篇 工学
- 2,919 篇 计算机科学与技术...
- 216 篇 软件工程
- 141 篇 机械工程
- 133 篇 光学工程
- 42 篇 生物工程
- 28 篇 信息与通信工程
- 25 篇 电气工程
- 17 篇 控制科学与工程
- 9 篇 电子科学与技术（可...
- 9 篇 化学工程与技术
- 9 篇 交通运输工程
- 8 篇 生物医学工程（可授...
- 7 篇 安全科学与工程
- 4 篇 材料科学与工程（可...
- 4 篇 建筑学
- 3 篇 土木工程
- 3 篇 农业工程
1,559 篇 医学
- 1,558 篇 临床医学
- 3 篇 基础医学(可授医学...
174 篇 理学
- 136 篇 物理学
- 43 篇 生物学
- 29 篇 数学
- 16 篇 统计学（可授理学、...
- 10 篇 化学
14 篇 管理学
- 7 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 3 篇 工商管理
5 篇 法学
- 3 篇 社会学
- 2 篇 法学
2 篇 教育学
- 2 篇 教育学
2 篇 农学
1 篇 经济学

主题

2,408 篇 computer vision
1,085 篇 training
1,043 篇 pattern recognit...
805 篇 conferences
709 篇 computational mo...
543 篇 visualization
491 篇 computer archite...
446 篇 three-dimensiona...
410 篇 semantics
409 篇 benchmark testin...
383 篇 codes
331 篇 transformers
290 篇 deep learning
277 篇 feature extracti...
260 篇 neural networks
256 篇 task analysis
238 篇 shape
216 篇 image segmentati...
204 篇 measurement
202 篇 object detection

机构

72 篇 tsinghua univ pe...
59 篇 univ sci & techn...
55 篇 zhejiang univ pe...
55 篇 chinese univ hon...
51 篇 carnegie mellon ...
51 篇 peng cheng lab p...
49 篇 swiss fed inst t...
47 篇 sensetime res pe...
46 篇 shanghai ai lab ...
42 篇 univ hong kong p...
38 篇 huawei noahs ark...
35 篇 univ chinese aca...
35 篇 shanghai jiao to...
33 篇 alibaba grp peop...
32 篇 tech univ munich...
31 篇 stanford univ st...
30 篇 peking univ peop...
30 篇 swiss fed inst t...
29 篇 adobe res san jo...
29 篇 google res mount...

作者

63 篇 timofte radu
30 篇 van gool luc
19 篇 yang yi
19 篇 qiao yu
18 篇 loy chen change
16 篇 zhang lei
16 篇 radu timofte
15 篇 sun jian
14 篇 liu yang
14 篇 liu shuaicheng
14 篇 tao dacheng
13 篇 li xin
13 篇 fan haoqiang
13 篇 chen wei-ting
12 篇 luo ping
12 篇 chen dongdong
12 篇 wang kai
12 篇 wang xinchao
12 篇 torralba antonio
12 篇 ghanem bernard

语言

3,916 篇 英文
1 篇 其他

检索条件"任意字段=2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2022"

共 3917 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

QAttn: Efficient GPU Kernels for mixed-precision vision Transformers

QAttn: Efficient GPU Kernels for mixed-precision Vision Tran...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kluska, Piotr Castello, Adrian Scheidegger, Florian Malossi, A. Cristiano I. Quintana-Orti, Enrique S. IBM Res Europe Ruschlikon Switzerland Univ Politecn Valencia Valencia Spain

ISBN: (纸本)9798350365474

vision Transformers have demonstrated outstanding performance in computer vision tasks. Nevertheless, this superior performance for large models comes at the expense of increasing memory usage for storing the parameters and intermediate activations. To accelerate model inference, in this work we develop and evaluate integer and mixed-precision kernels in Triton for the efficient execution of two fundamental building blocks of transformers -linear layer and attention- on graphics processing units (GPUs). On an NVIDIA A100 GPU, our kernel implementations of vision Transformers achieve a throughput speedup of up to 7x compared with reference kernels in PyTorch floating-point single precision (FP32). Additionally, the accuracy for the ViT Large model top-1 drops by less than one percent on the ImageNet1K classification task. We also observe up to 6x increased throughput by applying our kernels to the Segment Anything Model image encoder while keeping the mIOU close to the FP32 reference on the COCO2017 dataset for static and dynamic quantization. Furthermore, our kernels demonstrate improved speed to the TensorRT INT8 linear layer, and we improve the throughput of base FP16 (half precision) Triton attention on average by up to 19 +/- 4.01%. We have open-sourced the QAtnn framework, which is tightly integrated with the PyTorch quantization workflow https://***/IBM/qattn.

关键词： compression instance segmentation object classification quantization vision transformers

来源：评论

学校读者我要写书评

暂无评论

CAGE: Circumplex Affect Guided Expression Inference

CAGE: Circumplex Affect Guided Expression Inference

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Wagner, Niklas Maetzler, Felix Vossberg, Samed R. Schneider, Helen Pavlitska, Svetlana Zoellner, J. Marius Karlsruhe Inst Technol KIT Karlsruhe Germany FZI Res Ctr Informat Technol Karlsruhe Germany

ISBN: (纸本)9798350365474

Understanding emotions and expressions is a task of interest across multiple disciplines, especially for improving user experiences. Contrary to the common perception, it has been shown that emotions are not discrete entities but instead exist along a continuum. People understand discrete emotions differently due to a variety of factors, including cultural background, individual experiences, and cognitive biases. Therefore, most approaches to expression understanding, particularly those relying on discrete categories, are inherently biased. In this paper, we present a comparative in-depth analysis of two common datasets (AffectNet and EMOTIC) equipped with the components of the circumplex model of affect. Further, we propose a model for the prediction of facial expressions tailored for lightweight applications. Using a small-scaled MaxViT-based model architecture, we evaluate the impact of discrete expression category labels in training with the continuous valence and arousal labels. We show that considering valence and arousal in addition to discrete category labels helps to significantly improve expression inference. The proposed model outperforms the current state-of-the-art models on AffectNet, establishing it as the best-performing model for inferring valence and arousal achieving a 7% lower RMSE. Training scripts and trained weights to reproduce our results can be found here: https:// ***/wagner-niklas/CAGE_expression_inference.

关键词： computer vision Expression Inference Transformer

来源：评论

学校读者我要写书评

暂无评论

Pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motions

Pseudo-label based unsupervised fine-tuning of a monocular 3...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Suzuki, Tomohiro Tanaka, Ryota Takeda, Kazuya Fujii, Keisuke Nagoya Univ Nagoya Aichi Japan

ISBN: (纸本)9798350365474

Accurate motion capture is useful for sports motion analysis, but requires higher acquisition costs. Monocular or few camera multi-view pose estimation provides an accessible but less accurate alternative, especially for sports motion, due to training on datasets of daily activities. In addition, multi-view estimation is still costly due to camera calibration. Therefore, it is desirable to develop an accurate and cost-effective motion capture system for the daily training in sports. In this paper, we propose an accurate and convenient sports motion capture system based on unsupervised fine-tuning. The proposed system estimates 3D joint positions by multi-view estimation based on automatic calibration with the human body. These results are used as pseudo-labels for fine-tuning of the recent higher performance monocular 3D pose estimation model. Since the fine-tuning improves the model accuracy for sports motion, we can choose multi-view or monocular estimation depending on the situation. We evaluated the system using a running motion dataset and ASPset-510, and showed that fine-tuning improved the performance of monocular estimation to the same level as that of multi-view estimation for running motion. Our proposed system can be useful for the daily motion analysis in sports.

关键词： computer vision Pose estimation Running Sports

来源：评论

学校读者我要写书评

暂无评论

Segment Anything Model for Road Network Graph Extraction

Segment Anything Model for Road Network Graph Extraction

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Hetang, Congrui Xue, Haoru Le, Cindy Yue, Tianwei Wang, Wenping He, Yihui Carnegie Mellon Univ Pittsburgh PA 15213 USA Columbia Univ New York NY USA

ISBN: (纸本)9798350365474

We propose SAM-Road, an adaptation of the Segment Anything Model (SAM) [27] for extracting large-scale, vectorized road network graphs from satellite imagery. To predict graph geometry, we formulate it as a dense semantic segmentation task, leveraging the inherent strengths of SAM. The image encoder of SAM is fine-tuned to produce probability masks for roads and intersections, from which the graph vertices are extracted via simple non-maximum suppression. To predict graph topology, we designed a lightweight transformer-based graph neural network, which leverages the SAM image embeddings to estimate the edge existence probabilities between vertices. Our approach directly predicts the graph vertices and edges for large regions without expensive and complex post-processing heuristics and is capable of building complete road network graphs spanning multiple square kilometers in a matter of seconds. With its simple, straightforward, and minimalist design, SAM-Road achieves comparable accuracy with the state-of-the-art method RNGDet++[57], while being 40 times faster on the City-scale dataset. We thus demonstrate the power of a foundational vision model when applied to a graph learning task. The code is available at https://***/htcr/sam_road.

关键词： autonomous driving computer vision foundation model graph graph neural network mapping navigation remote sensing segment anything semantic segmentation transformer

来源：评论

学校读者我要写书评

暂无评论

Knowledge Distillation for Efficient Instance Semantic Segmentation with Transformers

Knowledge Distillation for Efficient Instance Semantic Segme...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Li, Maohui Halstead, Michael McCool, Chris Univ Bonn Bonn Germany Lamarr Inst Machine Learning & Artificial Intelli Dortmund Germany

ISBN: (纸本)9798350365474

Instance-based semantic segmentation provides detailed per-pixel scene understanding information crucial for both computer vision and robotics applications. However, state-of-the-art approaches such as Mask2Former are computationally expensive and reducing this computational burden while maintaining high accuracy remains challenging. Knowledge distillation has been regarded as a potential way to compress neural networks, but to date limited work has explored how to apply this to distill information from the output queries of a model such as Mask2Former. In this paper, we match the output queries of the student and teacher models to enable a query-based knowledge distillation scheme. We independently match the teacher and the student to the groundtruth and use this to define the teacher to student relationship for knowledge distillation. Using this approach we show that it is possible to perform knowledge distillation where the student models can have a lower number of queries and the backbone can be changed from a Transformer architecture to a convolutional neural network architecture. Experiments on two challenging agricultural datasets, sweet pepper (BUP20) and sugar beet (SB20), and Cityscapes demonstrate the efficacy of our approach. Across the three datasets the student models obtain an average absolute performance improvement in AP of 1.8 and 1.9 points for ResNet-50 and Swin-Tiny backbone respectively. To the best of our knowledge, this is the first work to propose knowledge distillation schemes for instance semantic segmentation with transformer-based models.

关键词： computer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformercomputer vision for Agriculture Automation Knowledge Distillation Efficient Instance Segmentation Transformer

来源：评论

学校读者我要写书评

暂无评论

Exploring the Benefits of vision Foundation Models for Unsupervised Domain Adaptation

Exploring the Benefits of Vision Foundation Models for Unsup...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Englert, Bruno B. Piva, Fabrizio J. Kerssies, Tommie de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350365474

Achieving robust generalization across diverse data domains remains a significant challenge in computer vision. This challenge is important in safety-critical applications, where deep-neural-network-based systems must perform reliably under various environmental conditions not seen during training. Our study investigates whether the generalization capabilities of vision Foundation Models (VFMs) and Unsupervised Domain Adaptation (UDA) methods for the semantic segmentation task are complementary. Results show that combining VFMs with UDA has two main benefits: (a) it allows for better UDA performance while maintaining the out-of-distribution performance of VFMs, and (b) it makes certain time-consuming UDA components redundant, thus enabling significant inference speedups. Specifically, with equivalent model sizes, the resulting VFM-UDA method achieves an 8.4x speed increase over the prior non-VFM state of the art, while also improving performance by +1.2 mIoU in the UDA setting and by +6.1 mIoU in terms of out-of-distribution generalization. Moreover, when we use a VFM with 3.6x more parameters, the VFM-UDA approach maintains a 3.3x speed up, while improving the UDA performance by +3.1 mIoU and the out-of-distribution performance by +10.3 mIoU. These results underscore the significant benefits of combining VFMs with UDA, setting new standards and baselines for Unsupervised Domain Adaptation in semantic segmentation. The implementation is available at https://***/tue-mps/vfmuda.

关键词： foundation model generalization semantic segmentation unsupervised domain adaptation vision foundation model

来源：评论

学校读者我要写书评

暂无评论

InVERGe: Intelligent Visual Encoder for Bridging Modalities in Report Generation

InVERGe: Intelligent Visual Encoder for Bridging Modalities ...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Deria, Ankan Kumar, Komal Chakraborty, Snehashis Mahapatra, Dwarikanath Roy, Sudipta Jio Inst Artificial Intelligence & Data Sci Navi Mumbai 410206 India Incept Inst Artificial Intelligence Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350365474

Medical image captioning plays an important role in modern healthcare, improving clinical report generation and aiding radiologists in detecting abnormalities and reducing misdiagnosis. The complex visual and textual data biases make this task more challenging. Recent advancements in transformer-based models have significantly improved the generation of radiology reports from medical images. However, these models require substantial computational resources for training and have been observed to produce unnatural language outputs when trained solely on raw image-text pairs. Our aim is to generate more detailed reports specific to images and to explain the reasoning behind the generated text through image-text alignment. Given the high computational demands of end-to-end model training, we introduce a two-step training methodology with an Intelligent Visual Encoder for Bridging Modalities in Report Generation (InVERGe) model. This model incorporates a lightweight transformer known as the Cross-Modal Query Fusion Layer (CMQFL), which utilizes the output from a frozen encoder to identify the most relevant text-grounded image embedding. This layer bridges the gap between the encoder and decoder, significantly reducing the workload on the decoder and enhancing the alignment between vision and language. Our experimental results, conducted using the MIMIC-CXR, Indiana University chest X-ray images, and CDD-CESM breast images datasets, demonstrate the effectiveness of our approach. Code: https://***/labsroy007/InVERGe

关键词： computer vision Deep Learning Large Language Model Medical Imaging Report Generation

来源：评论

学校读者我要写书评

暂无评论

Scaling Graph Convolutions for Mobile vision

Scaling Graph Convolutions for Mobile Vision

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Avery, William Munir, Mustafa Marculescu, Radu Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9798350365474

To compete with existing mobile architectures, MobileViG introduces Sparse vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, Mobile-ViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 AP(box) and 0.7 AP(mask), and MobileViGv2-B outperforms MobileViG-B by 1.0 AP(box) and 0.7 APmask. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% mIoU and MobileViGv2-B achieves 44.3% mIoU (1).

关键词： computer vision Deep Learning Edge AI Graph Neural Networks

来源：评论

学校读者我要写书评

暂无评论

DVMSR: Distillated vision Mamba for Efficient Super-Resolution

DVMSR: Distillated Vision Mamba for Efficient Super-Resoluti...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Lei, Xiaoyan Zhang, Wenlong Cao, Weifeng Zhengzhou Univ Light Ind Zhengzhou Peoples R China HongKong Polytech Univ Hong Kong Peoples R China

ISBN: (纸本)9798350365474

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://***/nathan66666/***

关键词： Efficient Image Super-Resolution vision Mamba

来源：评论

学校读者我要写书评

暂无评论

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kim, Taehoon Ahn, Pyunghwan Kim, Sangyun Lee, Sihaeng Marsden, Mark Sala, Alessandra Kim, Seung Hwan Han, Bohyung Lee, Kyoung Mu Lee, Honglak Bae, Kyounghoon Wu, Xiangyu Gao, Yi Zhang, Hailiang Yang, Yang Guo, Weili Lu, Jianfeng Oh, Youngtaek Cho, Jae Won Kim, Dong-Jin Kweon, In So Kim, Junmo Kang, Wooyoung Jhoo, Won Young Roh, Byungseok Mun, Jonghwan Oh, Solgil Ak, Kenan Emir Lee, Gwang-Gook Xu, Yan Shen, Mingwei Hwang, Kyomin Shin, Wonsik Lee, Kamin Park, Wonhark Lee, Dongkwan Kwak, Nojun Wang, Yujin Wang, Yimu Gu, Tiancheng Lv, Xingchang Sun, Mingmao

ISBN: (纸本)9798350365474

In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project1 and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks.

关键词： Image captioning Multimodal representation vision-language models

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共392页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：