检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,885 篇 会议
5 篇 期刊文献

馆藏范围

11,890 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,059 篇 工学
- 7,617 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 360 篇 软件工程
- 228 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,347 篇 医学
- 3,346 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
253 篇 理学
- 198 篇 系统科学
- 32 篇 物理学
- 21 篇 生物学
- 18 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,863 篇 英文
26 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11890 条记录，以下是531-540 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Super-Resolution Reconstruction from Bayer-pattern Spike Streams

Super-Resolution Reconstruction from Bayer-Pattern Spike Str...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Dong, Yanchen Xiong, Ruiqin Zhang, Jian Yu, Zhaofei Fan, Xiaopeng Zhu, Shuyuan Huang, Tiejun Peking Univ Sch Comp Sci Beijing Peoples R China Peking Univ Sch Elect & Comp Engn Beijing Peoples R China Peking Univ Inst Artificial Intelligence Beijing Peoples R China Harbin Inst Technol Sch Comp Sci & Technol Harbin Peoples R China Univ Elect Sci & Technol China Chengdu Peoples R China

ISBN: (纸本)9798350353006

Spike camera is a neuromorphic vision sensor that can capture highly dynamic scenes by generating a continuous stream of binary spikes to represent the arrival of photons at very high temporal resolution. Equipped with Bayer color filter array (CFA), color spike camera (CSC) has been invented to capture color information. Although spike camera has already demonstrated great potential for high-speed imaging, its spatial resolution is limited compared with conventional digital cameras. This paper proposes a Color Spike Camera Super-Resolution (CSCSR) network to super-resolve higher-resolution color images from spike camera streams with Bayer CFA. To be specific, we first propose a representation for Bayer-pattern spike streams, exploring local temporal information with global perception to represent the binary data. Then we exploit the CFA layout and sub-pixel level motion to collect temporal pixels for the spatial super-resolution of each color channel. In particular, a residual-based module for feature refinement is developed to reduce the impact of motion estimation errors. Considering color correlation, we jointly utilize the multi-stage temporal-pixel features of color channels to reconstruct the high-resolution color image. Experimental results demonstrate that the proposed scheme can reconstruct satisfactory color images with both high temporal and spatial resolution from low-resolution Bayer-pattern spike streams. The source codes are available at https://***/csycdong/CSCSR.

关键词： Bayer pattern Color filter arrary Demosaicing Neuromorphic sensor Spike camera Super resolution

来源：评论

学校读者我要写书评

暂无评论

Evaluating Confidence Calibration in Endoscopic Diagnosis Models

Evaluating Confidence Calibration in Endoscopic Diagnosis Mo...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Dehghani, Nikoo Thijssen, Ayla van der Zander, Quirine E. W. Schreuder, Ramon-Michel Schoon, Erik J. van der Sommen, Fons de With, Peter H. N. Eindhoven Univ Technol Eindhoven Netherlands Maastricht Univ Med Ctr Maastricht Netherlands GROW Res Inst Oncol & Reprod Maastricht Netherlands Catharina Hosp Eindhoven Netherlands Eindhoven Artificial Intelligence Syst Inst Eindhoven Netherlands

ISBN: (纸本)9798350365474

Colorectal polyps are prevalent precursors to colorectal cancer, making their accurate characterization essential for timely intervention and patient outcomes. Deep learning-based computer-aided diagnosis (CADx) systems have shown promising performance in the automated detection and categorization of colorectal polyps (CRP) using endoscopic images. However, alongside the advancement in diagnostic accuracy, the need for reliable and accurate quantification of uncertainty estimates within these systems has become increasingly important. The primary focus of this study is on refining the reliability of computer-aided diagnosis of CRPs within clinical practice. We perform an investigation of widely used model calibration techniques and how they translate into clinical applications, specifically for CRP categorization data. The experiments reveal that the Variational Inference method excels in intra-dataset calibration, but lacks efficiency and inter-dataset generalization. Laplace approximation and temperature scaling methods offer improved calibration across datasets.

关键词： Bayesian neural networks computer-aided diagnosis Confidence calibration Model reliability

来源：评论

学校读者我要写书评

暂无评论

GLaMM: Pixel Grounding Large Multimodal Model

GLaMM: Pixel Grounding Large Multimodal Model

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Rasheed, Hanoona Maaz, Muhammad Shaji, Sahal Shaker, Abdelrahman Khan, Salman Cholakkal, Hisham Anwer, Rao M. Xing, Eric Yang, Ming-Hsuan Khan, Fahad S. Mohamed Bin Zayed Univ AI Abu Dhabi U Arab Emirates Australian Natl Univ Canberra ACT Australia Aalto Univ Espoo Finland Carnegie Mellon Univ Pittsburgh PA 15213 USA Univ Calif Merced Merced CA USA Linkoping Univ Linkoping Sweden Google Res Mountain View CA USA

ISBN: (纸本)9798350353006

Large Multimodal Models (LMMs) extend Large Language Models to the vision domain. Initial LMMs used holistic images and text prompts to generate ungrounded textual responses. Recently, region-level LMMs have been used to generate visually grounded responses. However, they are limited to only referring to a single object cate-gory at a time, require users to specify the regions, or can-not offer dense pixel-wise object grounding. In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks. GLaMM not only grounds objects appearing in the conversations but is flexible enough to accept both textual and optional visual prompts (region of interest) as input. This empowers users to interact with the model at various levels of granularity, both in textual and visual domains. Due to the lack of standard benchmarks for the novel setting of visually Grounded Conversation Generation (GCG), we introduce a comprehensive evaluation protocol with our cu-rated grounded conversations. Our proposed GCG task requires densely grounded concepts in natural scenes at a large-scale. To this end, we propose a densely annotated Grounding-anything Dataset (GranD) using our proposed automated annotation pipeline that encompasses 7.5M unique concepts grounded in a total of 810M regions available with segmentation masks. Besides GCG, GLaMM also performs effectively on several downstream tasks, e. g., referring expression segmentation, image and region-level captioning and vision-language conversations.

关键词： automated dataset annotation LMM MLMM multimodal foundation models Multimodal LMM vision and language vision-language VLM

来源：评论

学校读者我要写书评

暂无评论

DeiT-LT: Distillation Strikes Back for vision Transformer Training on Long-Tailed Datasets

DeiT-LT: Distillation Strikes Back for Vision Transformer Tr...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Rangwani, Harsh Mondal, Pradipto Mishra, Mayank Asokan, Ashish Ramayee Babu, R. Venkatesh Indian Inst Sci Bangalore Karnataka India Indian Inst Technol Kharagpur W Bengal India

ISBN: (纸本)9798350353006

vision Transformer (ViT) has emerged as a prominent architecture for various computer vision tasks. In ViT, we divide the input image into patch tokens and process them through a stack of self-attention blocks. However, unlike Convolutional Neural Network (CNN), ViT's simple architecture has no informative inductive bias (e.g., locality, etc.). Due to this, ViT requires a large amount of data for pre-training. Various data-efficient approaches (DeiT) have been proposed to train ViT on balanced datasets effectively. However, limited literature discusses the use of ViT for datasets with long-tailed imbalances. In this work, we introduce DeiT-LT to tackle the problem of training ViTs from scratch on long-tailed datasets. In DeiT-LT, we introduce an efficient and effective way of distillation from CNN via distillation DIST token by using out-of-distribution images and re-weighting the distillation loss to enhance focus on tail classes. This leads to the learning of local CNN-like features in early ViT blocks, improving generalization for tail classes. Further, to mitigate overfitting, we propose distilling from a flat CNN teacher, which leads to learning low-rank generalizable features for DIST tokens across all ViT blocks. With the proposed DeiT-LT scheme, the distillation DIST token becomes an expert on the tail classes, and the classifier CLS token becomes an expert on the head classes. The experts help to effectively learn features corresponding to both the majority and minority classes using a distinct set of tokens within the same ViT architecture. We show the effectiveness of DeiT-LT for training ViT from scratch on datasets ranging from small-scale CIFAR-10 LT to large-scale iNaturalist-2018. Project Page: https://***/DeiT-LT.

关键词： distillation long-tail-learning vision transformers vit

来源：评论

学校读者我要写书评

暂无评论

FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model for Classification of Mass Margins in Digital Mammography

FPN-IAIA-BL: A Multi-Scale Interpretable Deep Learning Model...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yang, Julia Barnett, Alina Jade Donnelly, Jon Kishore, Satvik Fang, Jerry Schwartz, Fides Regina Chen, Chaofan Lo, Joseph Y. Rudin, Cynthia Duke Univ Durham NC 27708 USA Brigham & Womens Hosp 75 Francis St Boston MA 02115 USA Univ Maine Orono ME USA

ISBN: (纸本)9798350365474

Digital mammography is essential to breast cancer detection, and deep learning offers promising tools for faster and more accurate mammogram analysis. In radiology and other high-stakes environments, uninterpretable ("black box") deep learning models are unsuitable and there is a call in these fields to make interpretable models. Recent work in interpretable computer vision provides transparency to these formerly black boxes by utilizing prototypes for case-based explanations, achieving high accuracy in applications including mammography. However, these models struggle with precise feature localization, reasoning on large portions of an image when only a small part is relevant. This paper addresses this gap by proposing a novel multi-scale interpretable deep learning model for mammographic mass margin classification. Our contribution not only offers an interpretable model with reasoning aligned with radiologist practices, but also provides a general architecture for computer vision with user-configurable prototypes from coarse-to fine-grained prototypes.

关键词： breast cancer cancer computer vision deep learning interpretability interpretable machine learning machine learning mammography medical imaging neural networks

来源：评论

学校读者我要写书评

暂无评论

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression recognition in-the-wild

MMA-DFER: MultiModal Adaptation of unimodal models for Dynam...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chumachenko, Kateryna Iosifidis, Alexandros Gabbouj, Moncef Tampere Univ Tampere Finland Aarhus Univ Aarhus Denmark

ISBN: (纸本)9798350365474

Dynamic Facial Expression recognition (DFER) has received significant interest in the recent years dictated by its pivotal role in enabling empathic and human-compatible technologies. Achieving robustness towards in-the-wild data in DFER is particularly important for real-world applications. One of the directions aimed at improving such models is multimodal emotion recognition based on audio and video data. Multimodal learning in DFER increases the model capabilities by leveraging richer, complementary data representations. Within the field of multimodal DFER, recent methods have focused on exploiting advances of self-supervised learning (SSL) for pre-training of strong multimodal encoders [40]. Another line of research has focused on adapting pre-trained static models for DFER [8]. In this work, we propose a different perspective on the problem and investigate the advancement of multimodal DFER performance by adapting SSL-pre-trained disjoint unimodal encoders. We identify main challenges associated with this task, namely, intra-modality adaptation, cross-modal alignment, and temporal adaptation, and propose solutions to each of them. As a result, we demonstrate improvement over current state-of-the-art on two popular DFER benchmarks, namely DFEW [19] and MFAW [29].

关键词： audiovisual emotion recognition dynamic facial expression recognition facial expression recognition multi-modal multimodal adaptation

来源：评论

学校读者我要写书评

暂无评论

DELTA: Decoupling Long-Tailed Online Continual Learning

DELTA: Decoupling Long-Tailed Online Continual Learning

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Raghavan, Siddeshwar He, Jiangpeng Zhu, Fengqing Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

ISBN: (纸本)9798350365474

A significant challenge in achieving ubiquitous Artificial Intelligence is the limited ability of models to rapidly learn new information in real-world scenarios where data follows long-tailed distributions, all while avoiding forgetting previously acquired knowledge. In this work, we study the under-explored problem of Long-Tailed Online Continual Learning (LTOCL), which aims to learn new tasks from sequentially arriving class-imbalanced data streams. Each data is observed only once for training without knowing the task data distribution. We present DELTA, a decoupled learning approach designed to enhance learning representations and address the substantial imbalance in LTOCL. We enhance the learning process by adapting supervised contrastive learning to attract similar samples and repel dissimilar (out-of-class) samples. Further, by balancing gradients during training using an equalization loss, DELTA significantly enhances learning outcomes and successfully mitigates catastrophic forgetting. Through extensive evaluation, we demonstrate that DELTA improves the capacity for incremental learning, surpassing existing OCL methods. Our results suggest considerable promise for applying OCL in real-world applications. Code is available online (1)

关键词： computer vision long-tailed image classification online continual learning

来源：评论

学校读者我要写书评

暂无评论

Domain Targeted Synthetic Plant Style Transfer using Stable Diffusion, LoRA and ControlNet

Domain Targeted Synthetic Plant Style Transfer using Stable ...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Hartley, Zane K. J. Lind, Rob J. Pound, Michael P. French, Andrew P. Univ Nottingham Wollaton Rd Nottingham NG8 1BB England Syngenta Jealotts Hill Int Res Ctr Warfield England

ISBN: (纸本)9798350365474

Synthetic images can help alleviate much of the cost in the creation of training data for plant phenotyping-focused AI development. Synthetic-to-real style transfer is of particular interest to users of artificial data because of the domain shift problem created by training neural networks on images generated in a digital environment. In this paper we present a pipeline for synthetic plant creation and image-to-image style transfer, with a particular interest in synthetic to real domain adaptation targeting specific real datasets. Utilizing new advances in generative AI, we employ a combination of Stable diffusion, Low Ranked Adapters (LoRA) and ControlNets to produce an advanced system of style transfer. We focus our work on the core task of leaf instance segmentation, exploring both synthetic to real style transfer as well as inter-species style transfer and find that our pipeline makes numerous improvements over CycleGAN for style transfer, and the images we produce are comparable to real images when used as training data.

关键词： Agriculture computer vision ControlNet Deep Learning Diffusion LoRA Plant Phenotyping

来源：评论

学校读者我要写书评

暂无评论

Confidence-Aware RGB-D Face recognition via Virtual Depth Synthesis

Confidence-Aware RGB-D Face Recognition via Virtual Depth Sy...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Zijian Wang, Mei Deng, Weihong Shi, Hongzhi Wen, Dongchao Zhang, Yingjie Cui, Xingchen Zhao, Jian Inspur Beijing Elect Informat Ind Co Ltd Beijing Peoples R China

ISBN: (纸本)9798350365474

2D face recognition encounters challenges in unconstrained environments due to varying illumination, occlusion, and pose. Recent studies focus on RGB-D face recognition to improve robustness by incorporating depth information. However, collecting sufficient paired RGB-D training data is expensive and time-consuming, hindering wide deployment. In this work, we first construct a diverse depth dataset generated by 3D Morphable Models for depth model pre-training. Then, we propose a domain-independent pre-training framework that utilizes readily available pre-trained RGB and depth models to separately perform face recognition without needing additional paired data for retraining. To seamlessly integrate the two distinct networks and harness the complementary benefits of RGB and depth information for improved accuracy, we propose an innovative Adaptive Confidence Weighting (ACW). This mechanism is designed to learn confidence estimates for each modality to achieve modality fusion at the score level. Our method is simple and lightweight, only requiring ACW training beyond the backbone models. Experiments on multiple public RGB-D face recognition benchmarks demonstrate state-of-the-art performance surpassing previous methods based on depth estimation and feature fusion, validating the efficacy of our approach.

关键词： domain-independent pre-training modality fusion RGB-D face recognition

来源：评论

学校读者我要写书评

暂无评论

FairCLIP: Harnessing Fairness in vision-Language Learning

FairCLIP: Harnessing Fairness in Vision-Language Learning

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Luol, Yan Shil, Min Khan, Muhammad Osama Afzal, Muhammad Muneeb Huang, Hao Yuan, Shuaihang Tian, Yu Song, Luo Kouhana, Ava Elze, Tobias Fang, Yi Wang, Mengyu Harvard Univ Harvard Ophthalmol AI Lab Cambridge MA 02138 USA NYU Tandon Sch Engn New York NY USA New York Univ Abu Dhabi Multimedia & Visual Comp Lab Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350353006

Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair vision-language medical dataset (Harvard-FairVLMed) that provides detailed demographic attributes, ground-truth labels, and clinical notes to facilitate an in-depth examination of fairness within VL foundation models. Using Harvard-FairVLMed, we conduct a comprehensive fairness analysis of two widely-used VL models (CLIP and BLIP2), pre-trained on both natural and medical domains, across four different protected attributes. Our results highlight significant biases in all VL models, with Asian, Male, Non-Hispanic, and Spanish being the preferred subgroups across the protected attributes of race, gender, ethnicity, and language, respectively. In order to alleviate these biases, we propose FairCLIP, an optimal-transport-based approach that achieves a favorable trade-off between performance and fairness by reducing the Sinkhorn distance between the overall sample distribution and the distributions corresponding to each demographic group. As the first VL dataset of its kind, Harvard-FairVLMed holds the potential to catalyze advancements in the development of machine learning models that are both ethically aware and clinically effective. Our dataset and code are available at https://***/datasets/

关键词： AI for Medicine Equitable Deep Learning Fairness Learning Large Language Models Medical AI Ophthalmology Optimal Transport vision-Language Learning

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 50 51 52 53 54 55 56 57 58 59 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：