检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

12,844 篇 会议
13 篇 期刊文献
2 册 图书

馆藏范围

12,859 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,573 篇 工学
- 6,863 篇 计算机科学与技术...
- 880 篇 机械工程
- 814 篇 软件工程
- 435 篇 控制科学与工程
- 360 篇 光学工程
- 306 篇 电气工程
- 209 篇 仪器科学与技术
- 124 篇 信息与通信工程
- 91 篇 生物工程
- 62 篇 生物医学工程（可授...
- 39 篇 电子科学与技术（可...
- 34 篇 安全科学与工程
- 26 篇 化学工程与技术
- 21 篇 交通运输工程
- 20 篇 建筑学
- 18 篇 土木工程
2,957 篇 医学
- 2,956 篇 临床医学
- 15 篇 基础医学(可授医学...
- 12 篇 药学(可授医学、理...
700 篇 理学
- 359 篇 物理学
- 225 篇 数学
- 175 篇 系统科学
- 95 篇 统计学（可授理学、...
- 93 篇 生物学
- 22 篇 化学
201 篇 艺术学
- 201 篇 设计学（可授艺术学...
84 篇 管理学
- 59 篇 图书情报与档案管...
- 25 篇 管理科学与工程(可...
- 14 篇 工商管理
23 篇 法学
- 21 篇 社会学
5 篇 农学
4 篇 教育学
2 篇 经济学
1 篇 军事学

主题

6,464 篇 computer vision
2,693 篇 training
2,440 篇 pattern recognit...
1,778 篇 computational mo...
1,528 篇 visualization
1,348 篇 three-dimensiona...
1,091 篇 computer archite...
1,061 篇 semantics
997 篇 benchmark testin...
980 篇 codes
970 篇 conferences
852 篇 feature extracti...
828 篇 cameras
771 篇 task analysis
708 篇 deep learning
645 篇 image segmentati...
611 篇 object detection
584 篇 shape
554 篇 transformers
543 篇 neural networks

机构

132 篇 univ sci & techn...
122 篇 carnegie mellon ...
118 篇 tsinghua univ pe...
114 篇 univ chinese aca...
113 篇 chinese univ hon...
94 篇 tsinghua univers...
91 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
83 篇 university of ch...
80 篇 zhejiang univers...
78 篇 peng cheng labor...
77 篇 shanghai ai lab ...
75 篇 university of sc...
72 篇 peng cheng lab p...
69 篇 shanghai jiao to...
69 篇 shanghai jiao to...
69 篇 sensetime res pe...
68 篇 stanford univ st...
67 篇 alibaba grp peop...
67 篇 univ hong kong p...

作者

77 篇 timofte radu
63 篇 van gool luc
45 篇 zhang lei
39 篇 luc van gool
36 篇 yang yi
33 篇 tao dacheng
31 篇 loy chen change
29 篇 chen chen
29 篇 sun jian
28 篇 qi tian
25 篇 li xin
24 篇 liu yang
24 篇 tian qi
24 篇 ying shan
24 篇 wang xinchao
23 篇 zha zheng-jun
22 篇 boxin shi
21 篇 zhou jie
21 篇 vasconcelos nuno
20 篇 luo ping

语言

12,850 篇 英文
8 篇 其他
1 篇 中文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops"

共 12859 条记录，以下是551-560 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating vision-Language Transformer

MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Cao, Jianjian Ye, Peng Li, Shengze Yu, Chong Tang, Yansong Lu, Jiwen Chen, Tao Fudan Univ Sch Informat Sci & Technol Shanghai Peoples R China Fudan Univ Acad Engn & Technol Shanghai Peoples R China Tsinghua Univ Tsinghua Shenzhen Int Grad Sch Beijing Peoples R China Tsinghua Univ Dept Automat Beijing Peoples R China

ISBN: (纸本)9798350353006

vision-Language Transformers (VLTs) have shown great success recently, but are meanwhile accompanied by heavy computation costs, where a major reason can be attributed to the large number of visual and language tokens. Existing token pruning research for compressing VLTs mainly follows a single-modality-based scheme yet ignores the critical role of aligning different modalities for guiding the token pruning process, causing the important tokens for one modality to be falsely pruned in another modality branch. Meanwhile, existing VLT pruning works also lack the flexibility to dynamically compress each layer based on different input samples. To this end, we propose a novel framework named Multimodal Alignment-Guided Dynamic Token Pruning (MADTP) for accelerating various VLTs. Specifically, we first introduce a well-designed Multi-modality Alignment Guidance (MAG) module that can align features of the same semantic concept from different modalities, to ensure the pruned tokens are less important for all modalities. We further design a novel Dynamic Token Pruning (DTP) module, which can adaptively adjust the token compression ratio in each layer based on different input instances. Extensive experiments on various benchmarks demonstrate that MADTP significantly reduces the computational complexity of kinds of multimodal models while preserving competitive performance. Notably, when applied to the BLIP model in the NLVR2 dataset, MADTP can reduce the GFLOPs by 80% with less than 4% performance degradation. The code is available at https://***/double125/MADTP.

关键词： Model Compress Token Pruning

来源：评论

学校读者我要写书评

暂无评论

Towards Explaining Image-Based Distribution Shifts

Towards Explaining Image-Based Distribution Shifts

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Kulinski, Sean Inouye, David I. Purdue Univ Sch Elect & Comp Engn W Lafayette IN 47907 USA

ISBN: (数字)9781665487399

ISBN: (纸本)9781665487399

Distribution shift can have fundamental consequences such as signaling a change in the operating environment or significantly reducing the accuracy of downstream models. Thus, understanding such distribution shifts is critical for examining and hopefully mitigating the effect of such a shift. Most prior work has focused on either natively handling distribution shift (e.g., Domain Generalization) or merely detecting a shift while assuming any detected shift can be understood and handled appropriately by a human operator. For the latter, we hope to aid in these manual mitigation tasks by explaining the distribution shift to an operator. To this end, we suggest two methods: providing a set of interpretable mappings from the original distribution to the shifted one or providing a set of distributional counterfactual examples. We provide preliminary experiments on these two methods, and discuss important concepts and challenges for moving towards a better understanding of image-based distribution shifts.

关键词： computer vision conferences pattern recognition Task analysis

来源：评论

学校读者我要写书评

暂无评论

Adversarial Counterfactual Visual Explanations

Adversarial Counterfactual Visual Explanations

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Jeanneret, Guillaume Simon, Loic Jurie, Frederic Univ Caen Normandie ENSICAEN CNRS Caen France

ISBN: (纸本)9798350301298

Counterfactual explanations and adversarial attacks have a related goal: flipping output labels with minimal perturbations regardless of their characteristics. Yet, adversarial attacks cannot be used directly in a counterfactual explanation perspective, as such perturbations are perceived as noise and not as actionable and understandable image modifications. Building on the robust learning literature, this paper proposes an elegant method to turn adversarial attacks into semantically meaningful perturbations, without modifying the classifiers to explain. The proposed approach hypothesizes that Denoising Diffusion Probabilistic Models are excellent regularizers for avoiding high-frequency and out-of-distribution perturbations when generating adversarial attacks. The paper's key idea is to build attacks through a diffusion model to polish them. This allows studying the target model regardless of its robustification level. Extensive experimentation shows the advantages of our counterfactual explanation approach over current State-of-the-Art in multiple testbeds.

关键词： Explainable computer vision

来源：评论

学校读者我要写书评

暂无评论

Scaling Language-Image Pre-training via Masking

Scaling Language-Image Pre-training via Masking

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Li, Yanghao Fan, Haoqi Hu, Ronghang Feichtenhofert, Christoph He, Kaiming Meta AI FAIR New York NY 10023 USA

ISBN: (纸本)9798350301298

We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method for training CLIP [52]. Our method randomly masks out and removes a large portion of image patches during training. Masking allows us to learn from more image-text pairs given the same wall-clock time and contrast more samples per iteration with similar memory footprint. It leads to a favorable trade-off between accuracy and training time. In our experiments on 400 million image-text pairs, FLIP improves both accuracy and speed over the no-masking baseline. On a large diversity of downstream tasks, FLIP dominantly outperforms the CLIP counterparts trained on the same data. Facilitated by the speedup, we explore the scaling behavior of increasing the model size, data size, or training length, and report encouraging results and comparisons. We hope that our work will foster future research on scaling vision-language learning.

关键词： and reasoning language vision

来源：评论

学校读者我要写书评

暂无评论

Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection

Prompt-Enhanced Multiple Instance Learning for Weakly Superv...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Chen, Junxi Li, Liang Su, Li Zha, Zheng-Jun Huang, Qingming Univ Chinese Acad Sci Beijing Peoples R China Chinese Acad Sci Key Lab Intell Info Proc ICT Beijing Peoples R China Peng Cheng Lab Shenzhen Peoples R China Univ Sci & Technol China Hefei Peoples R China Chinese Acad Sci Key Lab Safety Beijing Peoples R China

ISBN: (纸本)9798350353006

Weakly-supervised Video Anomaly Detection (wVAD) aims to detect frame-level anomalies using only video-level labels in training. Due to the limitation of coarse-grained labels, Multi-Instance Learning (MIL) is prevailing in wVAD. However, MIL suffers from insufficiency of binary supervision to model diverse abnormal patterns. Besides, the coupling between abnormality and its context hinders the learning of clear abnormal event boundary. In this paper, we propose prompt-enhanced MIL to detect various abnormal events while ensuring clear event boundaries. Concretely, we design the abnormal-aware prompts by using abnormal class annotations together with learnable prompt, which can incorporate semantic priors into video features dynamically. The detector can utilize the semantic-rich features to capture diverse abnormal patterns. In addition, normal context prompt is introduced to amplify the distinction between abnormality and its context, facilitating the generation of clear boundary. With the mutual enhancement of abnormal-aware and normal context prompt, the model can construct discriminative representations to detect divergent anomalies without ambiguous event boundaries. Extensive experiments demonstrate our method achieves SOTA performance on three public bench-marks. The code is available at https://***/Junxi-Chen/PE-MIL.

关键词： computer vision Video Anomaly Detection

来源：评论

学校读者我要写书评

暂无评论

What Do You See in Vehicle? Comprehensive vision Solution for In-Vehicle Gaze Estimation

What Do You See in Vehicle? Comprehensive Vision Solution fo...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Cheng, Yihua Zhu, Yaning Wang, Zongji Hao, Hongquan Liu, Yongwei Cheng, Shiqing Wang, Xi Chang, Hyung Jin Univ Birmingham Birmingham W Midlands England Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China Chinese Acad Sci NIST Beijing Peoples R China CalmCar Suzhou Jiangsu Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Driver's eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles. Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well-annotated datasets in real driving scenarios. In this paper, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering dataset capturing in-vehicle gaze, collected from 125 subjects and covering a large range of gaze and head poses within vehicles. In this dataset, we propose a new vision-based solution for in-vehicle gaze collection, introducing a refined gaze target calibration method to tackle annotation challenges. Second, our research focuses on in-vehicle gaze estimation leveraging the IVGaze. In-vehicle face images often suffer from low resolution, prompting our introduction of a gaze pyramid transformer that leverages transformer-based multilevel features integration. Expanding upon this, we introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation. GazeDPTR shows state-of-the-art performance on the IVGaze dataset. Thirdly, we explore a novel strategy for gaze zone classification by extending the GazeDPTR. A foundational tri-plane and project gaze onto these planes are newly defined. Leveraging both positional features from the projection points and visual attributes from images, we achieve superior performance compared to relying solely on visual features, substantiating the advantage of gaze estimation. The project is available at https://***/work/ivgaze.

关键词： Image annotation

来源：评论

学校读者我要写书评

暂无评论

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment fro...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Yu, Tianyu Yao, Yuan Zhang, Haoye He, Taiwen Han, Yifeng Cui, Ganqu Hu, Jinyi Liu, Zhiyuan Zheng, Hai-Tao Sun, Maosong Tsinghua Univ Beijing Peoples R China Natl Univ Singapore Singapore Singapore Tsinghua Univ Shenzhen Int Grad Sch Beijing Peoples R China Pengcheng Lab Shenzhen Peoples R China

ISBN: (纸本)9798350353006

Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors with promising data and computation efficiency. Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the hallucination rate of the base MLLM by 34.8%, outperforming the concurrent LLaVA-RLHF trained on 10k annotated data. The final model achieves state-of-the-art performance in trustworthiness among open-source MLLMs, and shows better ro-bustness than GPT-4V in preventing hallucinations aroused from over-generalization.

关键词： hallucination language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Generalizable Whole Slide Image Classification with Fine-Gra...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Li, Hao Chen, Ying Chen, Yifei Yu, Rongshan Yang, Wenxian Wang, Liansheng Ding, Bowen Han, Yuchen Xiamen Univ Sch Informat Xiamen Peoples R China Huawei Xiamen Peoples R China Aginome Sci Xiamen Peoples R China Shanghai Jiao Tong Univ Shanghai Chest Hosp Dept Pathol Sch Med Shanghai Peoples R China

ISBN: (纸本)9798350353006

Whole Slide Image (WSI) classification is often formulated as a Multiple Instance Learning (MIL) problem. Recently, vision-Language Models (VLMs) have demonstrated remarkable performance in WSI classification. However, existing methods leverage coarse-grained pathogenetic descriptions for visual representation supervision, which are insufficient to capture the complex visual appearance of pathogenetic images, hindering the generalizability of models on diverse downstream tasks. Additionally, processing high-resolution WSIs can be computationally expensive. In this paper, we propose a novel "Fine-grained Visual-Semantic Interaction" (FiVE) framework for WSI classification. It is designed to enhance the model's generalizability by leveraging the interaction between localized visual patterns and fine-grained pathological semantics. Specifically, with meticulously designed queries, we start by utilizing a large language model to extract fine-grained pathological descriptions from various non-standardized raw reports. The output descriptions are then reconstructed into fine-grained labels used for training. By introducing a Task-specific Fine-grained Semantics (TFS) module, we enable prompts to capture crucial visual information in WSIs, which enhances representation learning and augments generalization capabilities significantly. Furthermore, given that pathological visual patterns are redundantly distributed across tissue slices, we sample a subset of visual instances during training. Our method demonstrates robust generalizability and strong transferability, dominantly outperforming the counterparts on the TCGA Lung Cancer dataset with at least 9.19% higher accuracy in few-shot experiments. The code is available at: https://***/ls1rius/WSI FiVE.

关键词： Fine-Grained Generalizable vision-Language-Model Visual-Semantic Whole Slide Image

来源：评论

学校读者我要写书评

暂无评论

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understan...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Zhihao Cao, Shengcao Wang, Yu-Xiong Xi An Jiao Tong Univ Xian Peoples R China Univ Illinois Champaign IL USA

ISBN: (纸本)9798350353006

The limited scale of current 3D shape datasets hinders the advancements in 3D shape understanding, and motivates multi-modal learning approaches which transfer learned knowledge from data-abundant 2D image and language modalities to 3D shapes. However, even though the image and language representations have been aligned by cross-modal models like CLIP, we find that the image modality fails to contribute as much as the language in existing multi-modal 3D representation learning methods. This is attributed to the domain shift in the 2D images and the distinct focus of each modality. To more effectively leverage both modalities in the pre-training, we introduce TriAdapter Multi-Modal Learning (TAMM) - a novel two-stage learning approach based on three synergistic adapters. First, our CLIP Image Adapter mitigates the domain gap between 3D-rendered images and natural images, by adapting the visual representations of CLIP for synthetic image-text pairs. Subsequently, our Dual Adapters decouple the 3D shape representation space into two complementary sub-spaces: one focusing on visual attributes and the other for semantic understanding, which ensure a more comprehensive and effective multi-modal pre-training. Extensive experiments demonstrate that TAMM consistently enhances 3D representations for a wide range of 3D encoder architectures, pre-training datasets, and downstream tasks. Notably, we boost the zero-shot classification accuracy on Objaverse-LVIS from 46.8% to 50.7%, and improve the 5-way 10-shot linear probing classification accuracy on ModelNet40 from 96.1% to 99.0%. Project page: https://***/tamm-page.

关键词： 3D shape classification 3D vision multi-modal learning

来源：评论

学校读者我要写书评

暂无评论

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Und...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Chen, Sijin Chen, Xin Zhang, Chi Li, Mingsheng Yu, Gang Fei, Hao Zhu, Hongyuan Fan, Jiayuan Chen, Tao Fudan Univ Shanghai Peoples R China Tencent PCG Shenzhen Peoples R China Natl Univ Singapore Singapore Singapore ASTAR Inst InfoComm Res I2R Singapore Singapore ASTAR Ctr Frontier AI Res CFAR Singapore Singapore

ISBN: (纸本)9798350353006

Recent progress in Large Multimodal Models (LMM) has opened up great possibilities for various applications in the field of human-machine interactions. However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud representations of the 3D scene. Existing works seek help from multi-view images by projecting 2D features to 3D space, which inevitably leads to huge computational overhead and performance degradation. In this paper, we present LL3DA, a Large Language 3D Assistant that takes point cloud as the direct input and responds to both text instructions and visual interactions. The additional visual interaction enables LMMs to better comprehend human interactions with the 3D environment and further remove the ambiguities within plain texts. Experiments show that LL3DA achieves remarkable results and surpasses various 3D vision-language models on both 3D Dense Captioning and 3D Question Answering.

关键词： large language models Multi-modal learning vision and language

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 52 53 54 55 56 57 58 59 60 61 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：