检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,267 篇 会议
14 篇 期刊文献

馆藏范围

11,281 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,859 篇 工学
- 7,418 篇 计算机科学与技术...
- 799 篇 机械工程
- 390 篇 电气工程
- 377 篇 软件工程
- 224 篇 控制科学与工程
- 68 篇 光学工程
- 32 篇 信息与通信工程
- 26 篇 生物工程
- 10 篇 生物医学工程（可授...
- 8 篇 化学工程与技术
- 7 篇 电子科学与技术（可...
- 6 篇 交通运输工程
- 5 篇 安全科学与工程
- 3 篇 仪器科学与技术
- 2 篇 力学（可授工学、理...
- 2 篇 材料科学与工程（可...
- 2 篇 动力工程及工程热...
- 2 篇 航空宇航科学与技...
3,103 篇 医学
- 3,102 篇 临床医学
- 4 篇 基础医学(可授医学...
297 篇 理学
- 199 篇 系统科学
- 69 篇 物理学
- 27 篇 生物学
- 24 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
23 篇 管理学
- 14 篇 图书情报与档案管...
- 9 篇 管理科学与工程(可...
- 4 篇 工商管理
6 篇 法学
- 6 篇 社会学
2 篇 农学
1 篇 教育学
1 篇 艺术学

主题

5,461 篇 computer vision
2,564 篇 training
2,118 篇 pattern recognit...
1,632 篇 computational mo...
1,454 篇 visualization
1,325 篇 three-dimensiona...
1,070 篇 semantics
972 篇 codes
968 篇 benchmark testin...
930 篇 computer archite...
885 篇 deep learning
831 篇 task analysis
729 篇 feature extracti...
541 篇 conferences
530 篇 neural networks
526 篇 face recognition
503 篇 transformers
480 篇 object detection
478 篇 image segmentati...
469 篇 cameras

机构

169 篇 univ sci & techn...
146 篇 tsinghua univ pe...
142 篇 univ chinese aca...
142 篇 carnegie mellon ...
132 篇 chinese univ hon...
122 篇 peng cheng lab p...
102 篇 zhejiang univ pe...
96 篇 sensetime res pe...
95 篇 swiss fed inst t...
90 篇 shanghai ai lab ...
86 篇 tsinghua univers...
86 篇 stanford univ st...
84 篇 shanghai jiao to...
80 篇 zhejiang univers...
79 篇 alibaba grp peop...
79 篇 univ hong kong p...
76 篇 peng cheng labor...
76 篇 tech univ munich...
74 篇 australian natl ...
73 篇 peking univ peop...

作者

67 篇 timofte radu
60 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
36 篇 loy chen change
36 篇 tao dacheng
31 篇 liu yang
30 篇 zhou jie
30 篇 chen chen
30 篇 tian qi
29 篇 sun jian
28 篇 zha zheng-jun
27 篇 qi tian
27 篇 boxin shi
26 篇 li xin
26 篇 vasconcelos nuno
26 篇 pollefeys marc
24 篇 liu xiaoming
24 篇 zheng wei-shi
24 篇 luo ping

语言

11,273 篇 英文
7 篇 其他
1 篇 中文

检索条件"任意字段=2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020"

共 11281 条记录，以下是301-310 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

OTE: Exploring Accurate Scene Text recognition Using One Token

OTE: Exploring Accurate Scene Text Recognition Using One Tok...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Xu, Jianjun Wang, Yuxin Xie, Hongtao Zhang, Yongdong Univ Sci & Technol China Hefei Peoples R China

ISBN: (纸本)9798350353006

In this paper, we propose a novel framework to fully exploit the potential of a single vector for scene text recognition (STR). Different from previous sequence-to-sequence methods that rely on a sequence of visual tokens to represent scene text images, we prove that just one token is enough to characterize the entire text image and achieve accurate text recognition. Based on this insight, we introduce a new paradigm for STR, called One Token rEcognizer (OTE). Specifically, we implement an image-to-vector encoder to extract the fine-grained global semantics, eliminating the need for sequential features. Furthermore, an elegant yet potent vector-to-sequence decoder is designed to adaptively diffuse global semantics to corresponding character locations, enabling both autoregressive and non-autoregressive decoding schemes. By executing decoding within a high-level representational space, our vector-to-sequence (V2S) approach avoids the alignment issues between visual tokens and character embeddings prevalent in traditional sequence-to-sequence methods. Remarkably, due to introducing character-wise fine-grained information, such global tokens also boost the performance of scene text retrieval tasks. Extensive experiments on synthetic and real datasets demonstrate the effectiveness of our method by achieving new state-of-the-art results on various public STR benchmarks. Our code is available at https://***/Xu-Jianjun/OTE.

关键词： CLIP Scene Text recognition Scene Text Retrieval

来源：评论

学校读者我要写书评

暂无评论

Leveraging vision-Language Models for Improving Domain Generalization in Image Classification

Leveraging Vision-Language Models for Improving Domain Gener...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Addepalli, Sravanti Asokan, Ashish Ramayee Sharma, Lakshay Babu, R. Venkatesh Indian Inst Sci Vision & AI Lab Bangalore Karnataka India

ISBN: (纸本)9798350353006

vision-Language Models (VLMs) such as CLIP are trained on large amounts of image-text pairs, resulting in remarkable generalization across several data distributions. However, in several cases, their expensive training and data collection/curation costs do not justify the end application. This motivates a vendor-client paradigm, where a vendor trains a large-scale VLM and grants only input-output access to clients on a pay-per-query basis in a black-box setting. The client aims to minimize inference cost by distilling the VLM to a student model using the limited available task-specific data, and further deploying this student model in the downstream application. While naive distillation largely improves the In-Domain (ID) accuracy of the student, it fails to transfer the superior out-of-distribution (OOD) generalization of the VLM teacher using the limited available labeled images. To mitigate this, we propose vision-Language to vision - Align, Distill, Predict (VL2V-ADiP), which first aligns the vision and language modalities of the teacher model with the vision modality of a pre-trained student model, and further distills the aligned VLM representations to the student. This maximally retains the pre-trained features of the student, while also incorporating the rich representations of the VLM image encoder and the superior generalization of the text embeddings. The proposed approach achieves state-of-the-art results on the standard Domain Generalization benchmarks in a black-box teacher setting as well as a white-box setting where the weights of the VLM are accessible. Project page: http://***/VL2V-ADiP/

关键词： CLIP Distillation Domain Generalization OOD Generalization vision-language models

来源：评论

学校读者我要写书评

暂无评论

Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis

Lacunarity Pooling Layers for Plant Image Classification usi...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Mohan, Akshatha Peeples, Joshua Texas A&M Univ Dept Elect & Comp Engn College Stn TX 77840 USA

ISBN: (纸本)9798350365474

Pooling layers (e.g., max and average) may overlook important information encoded in the spatial arrangement of pixel intensity and/or feature values. We propose a novel lacunarity pooling layer that aims to capture the spatial heterogeneity of the feature maps by evaluating the variability within local windows. The layer operates at multiple scales, allowing the network to adaptively learn hierarchical features. The lacunarity pooling layer can be seamlessly integrated into any artificial neural network architecture. Experimental results demonstrate the layer's effectiveness in capturing intricate spatial patterns, leading to improved feature extraction capabilities. The proposed approach holds promise in various domains, especially in agricultural image analysis tasks. This work contributes to the evolving landscape of artificial neural network architectures by introducing a novel pooling layer that enriches the representation of spatial features. Our code is publicly available. (1)

关键词： computer vision Image Classification Machine Learning Texture Analysis

来源：评论

学校读者我要写书评

暂无评论

IrrNet: Spatio-Temporal Segmentation guided Classification for Irrigation Mapping

IrrNet: Spatio-Temporal Segmentation guided Classification f...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Hoque, Oishee Bintey Univ Virginia Dept Comp Sci Charlottesville VA 22903 USA

ISBN: (纸本)9798350365474

Irrigation systems can vary widely in scale, from smallscale subsistence farming to large commercial agriculture (see Fig. 1 ). The heterogeneity in irrigation practices and systems across different regions adds to the complexity of mapping (see Fig. 1 ). Distinguishing between irrigated and non-irrigated areas is challenging due to the spectral characteristics of various irrigation systems and practices across different regions, further complicating the task of mapping different types of irrigation. For example, rainfed agriculture is prevalent in the Midwest, Southeast, and parts of the Northeast U.S., while irrigation is common in arid Western and Southwestern states. Rainfed farming can result in highly variable patterns of cultivation. Farmers may practice rainfed agriculture in some fields while irrigating others, leading to a complex mosaic of irrigated and non-irrigated areas within the same region. © 2024 ieee.

关键词： Deep Learning Irrigation Mapping MTL Remote Sensing Segmentation Transfer Learning vision in Agriculture

来源：评论

学校读者我要写书评

暂无评论

VCoder: Versatile vision Encoders for Multimodal Large Language Models

VCoder: Versatile Vision Encoders for Multimodal Large Langu...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Jain, Jitesh Yang, Jianwei Shi, Humphrey Georgia Tech SHI Labs Atlanta GA 30332 USA Microsoft Res Redmond WA USA Picsart AI Res PAIR Atlanta GA USA

ISBN: (纸本)9798350353006

Humans possess the remarkable skill of Visual Perception, the ability to see and understand the seen, helping them make sense of the visual world and, in turn, reason. Multimodal Large Language Models (MLLM) have recently achieved impressive performance on vision-language tasks ranging from visual question-answering and image captioning to visual reasoning and image generation. However, when prompted to identify or count (perceive) the entities in a given image, existing MLLM systems fail. Working towards developing an accurate MLLM system for perception and reasoning, we propose using Versatile vision enCoders (VCoder) as perception eyes for Multimodal LLMs. We feed the VCoder with perception modalities such as segmentation or depth maps, improving the MLLM's perception abilities. Secondly, we leverage the images from COCO and outputs from off-the-shelf vision perception models to create our COCO Segmentation Text (COST) dataset for training and evaluating MLLMs on the object perception task. Thirdly, we introduce metrics to assess the object perception abilities in MLLMs on our COST dataset. Lastly, we provide extensive experimental evidence proving the VCoder's improved object-level perception skills over existing Multimodal LLMs, including GPT-4V. We open-source our dataset, code, and models to promote research.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

MAPLM: A Real-World Large-Scale vision-Language Benchmark for Map and Traffic Scene Understanding

MAPLM: A Real-World Large-Scale Vision-Language Benchmark fo...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Cao, Xu Zhou, Tong Ma, Yunsheng Ye, Wenqian Cui, Can Tang, Kun Cao, Zhipeng Liang, Kaizhao Wang, Ziran Rehg, James M. Zheng, Chao Tencent T Lab Palo Alto CA 94306 USA Univ Illinois Champaign IL USA Purdue Univ W Lafayette IN USA Univ Virginia Charlottesville VA USA SambaNova Syst Inc Palo Alto CA USA

ISBN: (纸本)9798350353006

vision-language generative AI has demonstrated remarkable promise for empowering cross-modal scene understanding of autonomous driving and high-definition (HD) map systems. However, current benchmark datasets lack multi-modal point cloud, image, and language data pairs. Recent approaches utilize visual instruction learning and cross-modal prompt engineering to expand vision-language models into this domain. In this paper, we propose a new vision-language benchmark that can be used to finetune traffic and HD map domain-specific foundation models. Specifically, we annotate and leverage large-scale, broad-coverage traffic and map data extracted from huge HD map annotations, and use CLIP and LLaMA-2 / Vicuna to finetune a baseline model with instruction-following data. Our experimental results across various algorithms reveal that while visual instruction-tuning large language models (LLMs) can effectively learn meaningful representations from MAPLM-QA, there remains significant room for further advancements. To facilitate applying LLMs and multi-modal data into self-driving research, we will release our visual-language QA data, and the baseline models at ***/LLVM-AD/MAPLM.

关键词： High-definition (HD) Map Large Language Model Multimodal Learning vision-Language Model Visual Question Answering

来源：评论

学校读者我要写书评

暂无评论

TIM: A Time Interval Machine for Audio-Visual Action recognition

TIM: A Time Interval Machine for Audio-Visual Action Recogni...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chalk, Jacob Huh, Jaesung Kazakos, Evangelos Zisserman, Andrew Damen, Dima Univ Bristol Bristol Avon England Univ Oxford VGG Oxford England Czech Tech Univ Prague Czech Republic

ISBN: (纸本)9798350353006

Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalities in long videos by explicitly modelling the temporal extents of audio and visual events. We propose the Time Interval Machine (TIM) where a modality-specific time interval poses as a query to a transformer encoder that ingests a long video input. The encoder then attends to the specified interval, as well as the surrounding context in both modalities, in order to recognise the ongoing action. We test TIM on three long audio-visual video datasets: EPIC-KITCHENS, Perception Test, and AVE, reporting state-of-the-art (SOTA) for recognition. On EPIC-KITCHENS, we beat previous SOTA that utilises LLMs and significantly larger pre- training by 2.9% top-1 action recognition accuracy. Additionally, we show that TIM can be adapted for action detection, using dense multi-scale interval queries, outperforming SOTA on EPIC-KITCHENS-100 for most metrics, and showing strong performance on the Perception Test. Our ablations show the critical role of integrating the two modalities and modelling their time intervals in achieving this performance. Code and models at: https://***/JacobChalk/TIM.

关键词： action detection action recognition audio-visual learning egocentric videos video understanding

来源：评论

学校读者我要写书评

暂无评论

DRESS: Instructing Large vision-Language Models to Align and Interact with Humans via Natural Language Feedback

DRESS: Instructing Large Vision-Language Models to Align and...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Yangyi Sikka, Karan Cogswell, Michael Ji, Heng Divakaran, Ajay SRI Int Menlo Pk CA 94025 USA Univ Illinois Champaign IL 61820 USA

ISBN: (纸本)9798350353006

We present DRESS, a large vision language model (LVLM) that innovatively exploits Natural Language feedback (NLF) from Large Language Models to enhance its alignment and interactions by addressing two key limitations in the state-of-the-art LVLMs. First, prior LVLMs generally rely only on the instruction finetuning stage to enhance alignment with human preferences. Without incorporating extra feedback, they are still prone to generate unhelpful, hallucinated, or harmful responses. Second, while the visual instruction tuning data is generally structured in a multi-turn dialogue format, the connections and dependencies among consecutive conversational turns are weak. This reduces the capacity for effective multi-turn interactions. To tackle these, we propose a novel categorization of the NLF into two key types: critique and refinement. The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences. The refinement NLF offers concrete suggestions for improvement and is adopted to improve the interaction ability of the LVLMs- which focuses on LVLMs' ability to refine responses by incorporating feedback in multi-turn interactions. To address the non-differentiable nature of NLF, we generalize conditional reinforcement learning for training. Our experimental results demonstrate that DRESS can generate more helpful (9.76%), honest (11.52%), and harmless (21.03%) responses, and more effectively learn from feedback during multi-turn interactions compared to SOTA LVLMs.

关键词： Alignment Interaction Large vision Language Models Natural Language Feedback

来源：评论

学校读者我要写书评

暂无评论

Prompting vision Foundation Models for Pathology Image Analysis

Prompting Vision Foundation Models for Pathology Image Analy...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yin, Chong Liu, Siqi Zhou, Kaiyang Wong, Vincent Wai-Sun Yuen, Pong C. Hong Kong Baptist Univ Dept Comp Sci Hong Kong Peoples R China Chinese Univ Hong Kong Shenzhen Res Inst Big Data Shenzhen Peoples R China Chinese Univ Hong Kong Dept Med & Therapeut Hong Kong Peoples R China

ISBN: (纸本)9798350353006

The rapid increase in cases of non-alcoholic fatty liver disease (NAFLD) in recent years has raised significant public concern. Accurately identifying tissue alteration regions is crucial for the diagnosis of NAFLD, but this task presents challenges in pathology image analysis, particularly with small-scale datasets. Recently, the paradigm shift from full fine-tuning to prompting in adapting vision foundation models has offered a new perspective for small-scale data analysis. However, existing prompting methods based on task-agnostic prompts are mainly developed for generic image recognition, which fall short in providing instructive cues for complex pathology images. In this paper, we propose Quantitative Attribute-based Prompting (QAP), a novel prompting method specifically for liver pathology image analysis. QAP is based on two quantitative attributes, namely K-function-based spatial attributes and histogram-based morphological attributes, which are aimed for quantitative assessment of tissue states. Moreover, a conditional prompt generator is designed to turn these instance-specific attributes into visual prompts. Extensive experiments on three diverse tasks demonstrate that our task-specific prompting method achieves better diagnostic performance as well as better interpretability. Code is available at https://***/7LFB/QAP.

关键词： pathology image analysis Prompt quantitative attributes

来源：评论

学校读者我要写书评

暂无评论

MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning

MaskCLR: Attention-Guided Contrastive Learning for Robust Ac...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Abdelfattah, Mohamed Hassan, Mariam Alahi, Alexandre Ecole Polytech Fed Lausanne EPFL Lausanne Switzerland

ISBN: (纸本)9798350353006

Current transformer-based skeletal action recognition models tend to focus on a limited set of joints and low-level motion patterns to predict action classes. This results in significant performance degradation under small skeleton perturbations or changing the pose estimator between training and testing. In this work, we introduce MaskCLR, a new Masked Contrastive Learning approach for Robust skeletal action recognition. We propose an Attention-Guided Proba-bilistic Masking strategy to occlude the most important joints and encourage the model to explore a larger set of discrimi-native joints. Furthermore, we propose a Multi-Level Contrastive Learning paradigm to enforce the representations of standard and occluded skeletons to be class-discriminative, i.e., more compact within each class and more dispersed across different classes. Our approach helps the model capture the high-level action semantics instead of low-level joint variations, and can be conveniently incorporated into transformer-based models. Without loss of generality, we combine MaskCLR with three transformer backbones: the vanilla transformer, DSTFormer, and STTFormer. Extensive experiments on NTU60, NTU120, and Kinetics400 show that MaskCLR consistently outperforms previous state-of-the-art methods on standard and perturbed skeletons from different pose estimators, showing improved accuracy, generalization, and robustness. Project website: https://***.

关键词： contrastive learning Skeleton-based action recognition

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 27 28 29 30 31 32 33 34 35 36 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：