检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

4,539 篇 会议
8 册 图书
8 篇 期刊文献

馆藏范围

4,555 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,354 篇 工学
- 1,905 篇 计算机科学与技术...
- 545 篇 软件工程
- 434 篇 机械工程
- 329 篇 光学工程
- 263 篇 控制科学与工程
- 204 篇 仪器科学与技术
- 124 篇 信息与通信工程
- 109 篇 电气工程
- 80 篇 生物工程
- 50 篇 生物医学工程（可授...
- 35 篇 电子科学与技术（可...
- 27 篇 安全科学与工程
- 23 篇 化学工程与技术
- 18 篇 交通运输工程
- 16 篇 建筑学
- 14 篇 土木工程
485 篇 理学
- 324 篇 物理学
- 192 篇 数学
- 81 篇 生物学
- 77 篇 统计学（可授理学、...
- 22 篇 系统科学
- 20 篇 化学
197 篇 艺术学
- 197 篇 设计学（可授艺术学...
65 篇 管理学
- 49 篇 图书情报与档案管...
- 16 篇 管理科学与工程(可...
- 8 篇 工商管理
59 篇 医学
- 58 篇 临床医学
- 12 篇 基础医学(可授医学...
- 10 篇 药学(可授医学、理...
20 篇 法学
- 18 篇 社会学
7 篇 农学
4 篇 教育学
1 篇 经济学
1 篇 军事学

主题

1,864 篇 computer vision
906 篇 conferences
750 篇 pattern recognit...
721 篇 training
502 篇 cameras
411 篇 computational mo...
388 篇 feature extracti...
383 篇 visualization
324 篇 computer archite...
292 篇 image segmentati...
250 篇 robustness
244 篇 face recognition
227 篇 object detection
214 篇 three-dimensiona...
213 篇 shape
191 篇 semantics
184 篇 humans
182 篇 neural networks
170 篇 estimation
164 篇 computer science

机构

21 篇 university of sc...
21 篇 swiss fed inst t...
18 篇 swiss fed inst t...
17 篇 carnegie mellon ...
15 篇 univ sci & techn...
15 篇 institute for co...
14 篇 tsinghua univers...
14 篇 computer vision ...
13 篇 chinese univ hon...
13 篇 mit cambridge ma...
13 篇 tsinghua univ pe...
12 篇 harbin inst tech...
12 篇 chinese acad sci...
11 篇 comp vis ctr bar...
11 篇 eth zurich
11 篇 megvii technol p...
11 篇 stanford univ st...
11 篇 carnegie mellon ...
10 篇 univ modena & re...
10 篇 beihang univ peo...

作者

57 篇 timofte radu
19 篇 luc van gool
19 篇 radu timofte
15 篇 horst bischof
15 篇 sergio escalera
14 篇 van gool luc
12 篇 zhigang zhu
12 篇 chen wei-ting
11 篇 fan haoqiang
11 篇 li stan z.
11 篇 marios savvides
11 篇 marcos v. conde
11 篇 bischof horst
11 篇 lei lei
10 篇 cucchiara rita
10 篇 angel d. sappa
10 篇 liu shuaicheng
10 篇 huang thomas s.
10 篇 guoliang fan
9 篇 escalera sergio

语言

4,554 篇 英文
1 篇 土耳其文

检索条件"任意字段=2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2014"

共 4555 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

DVMSR: Distillated vision Mamba for Efficient Super-Resolution

DVMSR: Distillated Vision Mamba for Efficient Super-Resoluti...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Lei, Xiaoyan Zhang, Wenlong Cao, Weifeng Zhengzhou Univ Light Ind Zhengzhou Peoples R China HongKong Polytech Univ Hong Kong Peoples R China

ISBN: (纸本)9798350365474

Efficient Image Super-Resolution (SR) aims to accelerate SR network inference by minimizing computational complexity and network parameters while preserving performance. Existing state-of-the-art Efficient Image Super-Resolution methods are based on convolutional neural networks. Few attempts have been made with Mamba to harness its long-range modeling capability and efficient computational complexity, which have shown impressive performance on high-level vision tasks. In this paper, we propose DVMSR, a novel lightweight Image SR network that incorporates vision Mamba and a distillation strategy. The network of DVMSR consists of three modules: feature extraction convolution, multiple stacked Residual State Space Blocks (RSSBs), and a reconstruction module. Specifically, the deep feature extraction module is composed of several residual state space blocks (RSSB), each of which has several vision Mamba Moudles(ViMM) together with a residual connection. To achieve efficiency improvement while maintaining comparable performance, we employ a distillation strategy to the vision Mamba network for superior performance. Specifically, we leverage the rich representation knowledge of teacher network as additional supervision for the output of lightweight student networks. Extensive experiments have demonstrated that our proposed DVMSR can outperform state-of-the-art efficient SR methods in terms of model parameters while maintaining the performance of both PSNR and SSIM. The source code is available at https://***/nathan66666/***

关键词： Efficient Image Super-Resolution vision Mamba

来源：评论

学校读者我要写书评

暂无评论

Scaling Graph Convolutions for Mobile vision

Scaling Graph Convolutions for Mobile Vision

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Avery, William Munir, Mustafa Marculescu, Radu Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9798350365474

To compete with existing mobile architectures, MobileViG introduces Sparse vision Graph Attention (SVGA), a fast token-mixing operator based on the principles of GNNs. However, MobileViG scales poorly with model size, falling at most 1% behind models with similar latency. This paper introduces Mobile Graph Convolution (MGC), a new vision graph neural network (ViG) module that solves this scaling problem. Our proposed mobile vision architecture, Mobile-ViGv2, uses MGC to demonstrate the effectiveness of our approach. MGC improves on SVGA by increasing graph sparsity and introducing conditional positional encodings to the graph operation. Our smallest model, MobileViGv2-Ti, achieves a 77.7% top-1 accuracy on ImageNet-1K, 2% higher than MobileViG-Ti, with 0.9 ms inference latency on the iPhone 13 Mini NPU. Our largest model, MobileViGv2-B, achieves an 83.4% top-1 accuracy, 0.8% higher than MobileViG-B, with 2.7 ms inference latency. Besides image classification, we show that MobileViGv2 generalizes well to other tasks. For object detection and instance segmentation on MS COCO 2017, MobileViGv2-M outperforms MobileViG-M by 1.2 AP(box) and 0.7 AP(mask), and MobileViGv2-B outperforms MobileViG-B by 1.0 AP(box) and 0.7 APmask. For semantic segmentation on ADE20K, MobileViGv2-M achieves 42.9% mIoU and MobileViGv2-B achieves 44.3% mIoU (1).

关键词： computer vision Deep Learning Edge AI Graph Neural Networks

来源：评论

学校读者我要写书评

暂无评论

Hierarchical NeuroSymbolic Approach for Comprehensive and Explainable Action Quality Assessment

Hierarchical NeuroSymbolic Approach for Comprehensive and Ex...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Okamoto, Lauren Parmar, Paritosh Princeton Univ Princeton NJ 08544 USA ASTAR IHPC Singapore Singapore

ISBN: (纸本)9798350365474

Action quality assessment (AQA) applies computer vision to quantitatively assess the performance or execution of a human action. Current AQA approaches are end-to-end neural models, which lack transparency and tend to be biased because they are trained on subjective human judgements as ground-truth. To address these issues, we introduce a neuro-symbolic paradigm for AQA, which uses neural networks to abstract interpretable symbols from video data and makes quality assessments by applying rules to those symbols. We take diving as the case study. We found that domain experts prefer our system and find it more informative than purely neural approaches to AQA in diving. Our system also achieves state-of-the-art action recognition and temporal segmentation, and automatically generates a detailed report that breaks the dive down into its elements and provides objective scoring with visual evidence. As verified by a group of domain experts, this report may be used to assist judges in scoring, help train judges, and provide feedback to divers. Annotated training data and code: https://***/laurenok24/NSAQA.

关键词： action quality assessment action recognition AI Coach AI Diving Coach AI Diving Judge AI Olympics Judge explainable AI fairness in AI interpretable action analysis interpretable action quality assessment interpretable fine-grained action quality assessment neuro-symbolic computer vision neurosymbolic action assessment neurosymbolic action scoring neurosymbolic AI neurosymbolic fine-grained action analysis neurosymbolic fine-grained action quality assessment neurosymbolic fine-grained action recogntion neurosymbolic fine-grained action understanding neurosymbolic skills assessment neurosymbolic temporal segmentation neurosymbolic video understanding Olympics Scoring representation learning skills assessment temporal segmentation transparent AI XAI

来源：评论

学校读者我要写书评

暂无评论

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Taehoon Ahn, Pyunghwan Kim, Sangyun Lee, Sihaeng Marsden, Mark Sala, Alessandra Kim, Seung Hwan Han, Bohyung Lee, Kyoung Mu Lee, Honglak Bae, Kyounghoon Wu, Xiangyu Gao, Yi Zhang, Hailiang Yang, Yang Guo, Weili Lu, Jianfeng Oh, Youngtaek Cho, Jae Won Kim, Dong-Jin Kweon, In So Kim, Junmo Kang, Wooyoung Jhoo, Won Young Roh, Byungseok Mun, Jonghwan Oh, Solgil Ak, Kenan Emir Lee, Gwang-Gook Xu, Yan Shen, Mingwei Hwang, Kyomin Shin, Wonsik Lee, Kamin Park, Wonhark Lee, Dongkwan Kwak, Nojun Wang, Yujin Wang, Yimu Gu, Tiancheng Lv, Xingchang Sun, Mingmao

ISBN: (纸本)9798350365474

In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project1 and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks.

关键词： Image captioning Multimodal representation vision-language models

来源：评论

学校读者我要写书评

暂无评论

Pseudo-label based unsupervised fine-tuning of a monocular 3D pose estimation model for sports motions

Pseudo-label based unsupervised fine-tuning of a monocular 3...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Suzuki, Tomohiro Tanaka, Ryota Takeda, Kazuya Fujii, Keisuke Nagoya Univ Nagoya Aichi Japan

ISBN: (纸本)9798350365474

Accurate motion capture is useful for sports motion analysis, but requires higher acquisition costs. Monocular or few camera multi-view pose estimation provides an accessible but less accurate alternative, especially for sports motion, due to training on datasets of daily activities. In addition, multi-view estimation is still costly due to camera calibration. Therefore, it is desirable to develop an accurate and cost-effective motion capture system for the daily training in sports. In this paper, we propose an accurate and convenient sports motion capture system based on unsupervised fine-tuning. The proposed system estimates 3D joint positions by multi-view estimation based on automatic calibration with the human body. These results are used as pseudo-labels for fine-tuning of the recent higher performance monocular 3D pose estimation model. Since the fine-tuning improves the model accuracy for sports motion, we can choose multi-view or monocular estimation depending on the situation. We evaluated the system using a running motion dataset and ASPset-510, and showed that fine-tuning improved the performance of monocular estimation to the same level as that of multi-view estimation for running motion. Our proposed system can be useful for the daily motion analysis in sports.

关键词： computer vision Pose estimation Running Sports

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Methods and Results

NTIRE 2024 Challenge on Stereo Image Super-Resolution: Metho...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Wang, Longguang Guo, Yulan Li, Juncheng Liu, Hongda Zhao, Yang Wang, Yingqian Jin, Zhi Gu, Shuhang Timofte, Radu Aviation University of Air Force Sun Yat-sen University The Shenzhen Campus of Sun Yat-sen University China National University of Defense Technology China Shanghai University China University of Electronic Science and Technology of China China Computer Vision Lab University of Würzburg Germany

ISBN: (纸本)9798350365474

This paper summarizes the 3rd NTIRE challenge on stereo image super-resolution (SR) with a focus on new solutions and results. The task of this challenge is to super-resolve a low-resolution stereo image pair to a high-resolution one with a magnification factor of x4 under a limited computational budget. Compared with single image SR, the major challenge of this challenge lies in how to exploit additional information in another viewpoint and how to maintain stereo consistency in the results. This challenge has 2 tracks, including one track on bicubic degradation and one track on real degradations. In total, 108 and 70 participants were successfully registered for each track, respectively. In the test phase, 14 and 13 teams successfully submitted valid results with PSNR (RGB) scores better than the baseline. This challenge establishes a new benchmark for stereo image SR.

关键词： Stereocenters

来源：评论

学校读者我要写书评

暂无评论

InVERGe: Intelligent Visual Encoder for Bridging Modalities in Report Generation

InVERGe: Intelligent Visual Encoder for Bridging Modalities ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Deria, Ankan Kumar, Komal Chakraborty, Snehashis Mahapatra, Dwarikanath Roy, Sudipta Jio Inst Artificial Intelligence & Data Sci Navi Mumbai 410206 India Incept Inst Artificial Intelligence Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350365474

Medical image captioning plays an important role in modern healthcare, improving clinical report generation and aiding radiologists in detecting abnormalities and reducing misdiagnosis. The complex visual and textual data biases make this task more challenging. Recent advancements in transformer-based models have significantly improved the generation of radiology reports from medical images. However, these models require substantial computational resources for training and have been observed to produce unnatural language outputs when trained solely on raw image-text pairs. Our aim is to generate more detailed reports specific to images and to explain the reasoning behind the generated text through image-text alignment. Given the high computational demands of end-to-end model training, we introduce a two-step training methodology with an Intelligent Visual Encoder for Bridging Modalities in Report Generation (InVERGe) model. This model incorporates a lightweight transformer known as the Cross-Modal Query Fusion Layer (CMQFL), which utilizes the output from a frozen encoder to identify the most relevant text-grounded image embedding. This layer bridges the gap between the encoder and decoder, significantly reducing the workload on the decoder and enhancing the alignment between vision and language. Our experimental results, conducted using the MIMIC-CXR, Indiana University chest X-ray images, and CDD-CESM breast images datasets, demonstrate the effectiveness of our approach. Code: https://***/labsroy007/InVERGe

关键词： computer vision Deep Learning Large Language Model Medical Imaging Report Generation

来源：评论

学校读者我要写书评

暂无评论

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Sate...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Dhakal, Aayush Ahmad, Adeel Khanal, Subash Sastry, Srikumar Kerner, Hannah Jacobs, Nathan Washington Univ St Louis MO 63110 USA Taylor Geospatial Inst St Louis MO USA Arizona State Univ Tempe AZ 85287 USA

ISBN: (纸本)9798350365474

We propose a weakly supervised approach for creating maps using free-form textual descriptions. We refer to this work of creating textual maps as zero-shot mapping. Prior works have approached mapping tasks by developing models that predict a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Mapping text, on the other hand, allows us to solve a large variety of mapping problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset with 6.1M pairs of overhead and ground-level images. For a given location and overhead image, our model predicts the expected CLIP embeddings of the ground-level scenery. The predicted CLIP embeddings are then used to learn about the textual space associated with that location. Sat2Cap is also conditioned on date-time information, allowing it to model temporally varying concepts over a location. Our experimental results demonstrate that our models successfully capture ground-level concepts and allow large-scale mapping of fine-grained textual queries. Our approach does not require any text-labeled data, making the training easily scalable. The code, dataset, and models will be made publicly available.

关键词： Contrastive Learning Text-based Mapping vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

How to Benchmark vision Foundation Models for Semantic Segmentation?

How to Benchmark Vision Foundation Models for Semantic Segme...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kerssies, Tommie de Geus, Daan Dubbelman, Gijs Eindhoven Univ Technol Eindhoven Netherlands

ISBN: (纸本)9798350365474

Recent vision foundation models (VFMs) have demonstrated proficiency in various tasks but require supervised fine-tuning to perform the task of semantic segmentation effectively. Benchmarking their performance is essential for selecting current models and guiding future model developments for this task. The lack of a standardized benchmark complicates comparisons. Therefore, the primary objective of this paper is to study how VFMs should be benchmarked for semantic segmentation. To do so, various VFMs are finetuned under various settings, and the impact of individual settings on the performance ranking and training time is assessed. Based on the results, the recommendation is to finetune the ViT-B variants of VFMs with a 16 x 16 patch size and a linear decoder, as these settings are representative of using a larger model, more advanced decoder and smaller patch size, while reducing training time by more than 13 times. Using multiple datasets for training and evaluation is also recommended, as the performance ranking across datasets and domain shifts varies. Linear probing, a common practice for some VFMs, is not recommended, as it is not representative of end-to-end fine-tuning. The benchmarking setup recommended in this paper enables a performance analysis of VFMs for semantic segmentation. The findings of such an analysis reveal that pretraining with promptable segmentation is not beneficial, whereas masked image modeling (MIM) with abstract representations is crucial, even more important than the type of supervision used. The code for efficiently fine-tuning VFMs for semantic segmentation can be accessed through the project page(1).

关键词： benchmark semantic segmentation vision foundation models

来源：评论

学校读者我要写书评

暂无评论

Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from vision Transformer

Generalized Single-Image-Based Morphing Attack Detection Usi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Haoyu Ramachandra, Raghavendra Raja, Kiran Busch, Christoph Norwegian Univ Sci & Technol Trondheim Norway Darmstadt Univ Appl Sci Darmstadt Germany

ISBN: (纸本)9798350365474

Face morphing attacks have posed severe threats to Face recognition Systems (FRS), which are operated in border control and passport issuance use cases. Correspondingly, morphing attack detection algorithms (MAD) are needed to defend against such attacks. MAD approaches must be robust enough to handle unknown attacks in an open-set scenario where attacks can originate from various morphing generation algorithms, post-processing and the diversity of printers/scanners. The problem of generalization is further pronounced when the detection has to be made on a single suspected image. In this paper, we propose a generalized single-image-based MAD (S-MAD) algorithm by learning the encoding from vision Transformer (ViT) architecture. Compared to CNN-based architectures, ViT model has the advantage on integrating local and global information and hence can be suitable to detect the morphing traces widely distributed among the face region. Extensive experiments are carried out on face morphing datasets generated using publicly available FRGC face datasets. Several state-of-the-art (SOTA) MAD algorithms, including representative ones that have been publicly evaluated, have been selected and benchmarked with our ViT-based approach. Obtained results demonstrate the improved detection performance of the proposed S-MAD method on inter-dataset testing (when different data is used for training and testing) and comparable performance on intra-dataset testing (when the same data is used for training and testing) experimental protocol.

关键词： Face recognition Morphing Attack Detection

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共456页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：