检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

22,771 篇 会议
112 篇 期刊文献
23 册 图书

馆藏范围

22,905 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,398 篇 工学
- 10,880 篇 计算机科学与技术...
- 3,450 篇 软件工程
- 2,430 篇 机械工程
- 1,721 篇 光学工程
- 1,010 篇 控制科学与工程
- 998 篇 电气工程
- 761 篇 信息与通信工程
- 393 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 215 篇 电子科学与技术（可...
- 113 篇 化学工程与技术
- 112 篇 安全科学与工程
- 98 篇 测绘科学与技术
- 92 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,362 篇 医学
- 3,348 篇 临床医学
- 79 篇 基础医学(可授医学...
3,250 篇 理学
- 1,953 篇 物理学
- 1,664 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 109 篇 化学
506 篇 管理学
- 299 篇 图书情报与档案管...
- 219 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
62 篇 法学
- 59 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
3 篇 文学

主题

10,126 篇 computer vision
4,025 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,792 篇 cameras
1,758 篇 visualization
1,485 篇 shape
1,466 篇 image segmentati...
1,447 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,169 篇 computer archite...
1,144 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
64 篇 tsinghua univ pe...
63 篇 adobe research
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,844 篇 英文
35 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22906 条记录，以下是261-270 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network

A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neu...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ma, Ruichen Qiao, Guanchao Liu, Yian Meng, Liwei Ning, Ning Liu, Yang Hu, Shaogang Univ Elect Sci & Technol China Chengdu Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Binary neural networks utilize 1-bit quantized weights and activations to reduce both the model's storage demands and computational burden. However, advanced binary architectures still incorporate millions of inefficient and non-hardware-friendly full-precision multiplication operations. A&B BNN is proposed to directly remove part of the multiplication operations in a traditional BNN and replace the rest with an equal number of bit operations, introducing the mask layer and the quantized RPReLU structure based on the normalizer-free network architecture. The mask layer can be removed during inference by leveraging the intrinsic characteristics of BNN with straightforward mathematical transformations to avoid the associated multiplication operations. The quantized RPReLU structure enables more efficient bit operations by constraining its slope to be integer powers of 2. Experimental results achieved 92.30%, 69.35%, and 66.89% on the CIFAR-10, CIFAR-100, and ImageNet datasets, respectively, which are competitive with the state-of-the-art. Ablation studies have verified the efficacy of the quantized RPReLU structure, leading to a 1.14% enhancement on the ImageNet compared to using a fixed slope RLeakyReLU. The proposed add&bit-operation-only BNN offers an innovative approach for hardware-friendly network architecture.

关键词： Binary neural networks Hardware-friendly structure Multiplication-free structure pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Any-Shift Prompting for Generalization over Distributions

Any-Shift Prompting for Generalization over Distributions

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xiao, Zehao Shen, Jiayi Derakhshani, Mohammad Mandi Liao, Shengcai Snoek, Cees G. M. Univ Amsterdam Amsterdam Netherlands Core42 Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350353006

Image-language models with prompt learning have shown remarkable advances in numerous downstream vision tasks. Nevertheless, conventional prompt learning methods overfit their training distribution and lose the generalization ability on test distributions. To improve generalization across various distribution shifts, we propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning. We explicitly connect training and test distributions in the latent space by constructing training and test prompts in a hierarchical architecture. Within this frame-work, the test prompt exploits the distribution relationships to guide the generalization of the CLIP image-language model from training to any test distribution. To effectively encode the distribution information and their relationships, we further introduce a transformer inference network with a pseudo-shift training mechanism. The network generates the tailored test prompt with both training and test information in a feedforward pass, avoiding extra training costs at test time. Extensive experiments on twenty-three datasets demonstrate the effectiveness of any-shift prompting on the generalization over various distribution shifts.

关键词： distribution shifts out-of-distribution generalization prompt learning vision-language

来源：评论

学校读者我要写书评

暂无评论

Instance-Aware Group Quantization for vision Transformers

Instance-Aware Group Quantization for Vision Transformers

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jaehyeon, Moon Kim, Dohyung Cheon, Junyong Ham, Bumsub Yonsei Univ Seoul South Korea Articron Bogota Colombia

ISBN: (纸本)9798350353006

Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model using only a small calibration set of unlabeled samples without retraining. PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts. Directly applying them to vision transformers (ViTs), however, incurs severe performance degradation, mainly due to the differences in architectures between CNNs and ViTs. In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs. To address this, we introduce instance-aware group quantization for ViTs (IGQ-ViT). To this end, we propose to split the channels of activation maps into multiple groups dynamically for each input instance, such that activations within each group share similar statistical properties. We also extend our scheme to quantize softmax attentions across tokens. In addition, the number of groups for each layer is adjusted to minimize the discrepancies between predictions from quantized and full-precision models, under a bit-operation (BOP) constraint. We show extensive experimental results on image classification, object detection, and instance segmentation, with various transformer architectures, demonstrating the effectiveness of our approach.

关键词： Post-training quantization ViT

来源：评论

学校读者我要写书评

暂无评论

CPLIP: Zero-Shot Learning for Histopathology with Comprehensive vision-Language Alignment

CPLIP: Zero-Shot Learning for Histopathology with Comprehens...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Javed, Sajid Mahmood, Arif Ganapathil, Iyyakutti Iyappan Dharej, Fayaz Ali Werghil, Naoufel Bennamoun, Mohammed Khalifa Univ Sci & Technol Dept Comp Sci Abu Dhabi U Arab Emirates Khalifa Univ Sci & Technol C2PS Abu Dhabi U Arab Emirates Informat Technol Univ Punjab Lahore Pakistan Univ Western Australia Perth WA Australia

ISBN: (纸本)9798350353006

This paper proposes Comprehensive Pathology Language Image Pre-training (CPLIP), a new unsupervised technique designed to enhance the alignment of images and text in histopathology for tasks such as classification and segmentation. This methodology enriches vision-language models by leveraging extensive data without needing ground truth annotations. CPLIP involves constructing a pathology-specific dictionary, generating textual descriptions for images using language models, and retrieving relevant images for each text snippet via a pre-trained model. The model is then fine-tuned using a many-to-many contrastive learning method to align complex interrelated concepts across both modalities. Evaluated across multiple histopathology tasks, CPLIP shows notable improvements in zero-shot learning scenarios, outperforming existing methods in both interpretability and robustness and setting a higher benchmark for the application of vision-language models in the field. To encourage further research and replication, the code for CPLIP is available on GitHub at https://***/

关键词： Cancer Detection Computational Pathology Contrastive Loss Histopathology Many-to-Many vision-Language Alignment vision Language Modeling Whole Slide Image Zero-shot Learning

来源：评论

学校读者我要写书评

暂无评论

A Closer Look at the Few-Shot Adaptation of Large vision-Language Models

A Closer Look at the Few-Shot Adaptation of Large Vision-Lan...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Iguez, Julio Silva-Rodr Hajimiri, Sina Ben Ayed, Ismail Dolz, Jose ETS Montreal Montreal PQ Canada

ISBN: (纸本)9798350353006

Efficient transfer learning (ETL) is receiving increasing attention to adapt large pre-trained language-vision models on downstream tasks with a few labeled samples. While significant progress has been made, we reveal that state-of-the-art ETL approaches exhibit strong performance only in narrowly-defined experimental setups, and with a careful adjustment of hyperparameters based on a large corpus of labeled samples. In particular, we make two interesting, and surprising empirical observations. First, to out-perform a simple Linear Probing baseline, these methods require to optimize their hyper-parameters on each target task. And second, they typically underperform -sometimes dramatically-standard zero-shot predictions in the presence of distributional drifts. Motivated by the unrealistic assumptions made in the existing literature, i.e., access to a large validation set and case-specific grid-search for optimal hyperparameters, we propose a novel approach that meets the requirements of real-world scenarios. More concretely, we introduce a CLass-Adaptive linear Probe ( CLAP) objective, whose balancing term is optimized via an adaptation of the general Augmented Lagrangian method tailored to this context. We comprehensively evaluate CLAP on a broad span of datasets and scenarios, demonstrating that it consistently outperforms SoTA approaches, while yet being a much more efficient alternative. Code available at https://***/jusiro/CLAP.

关键词：

来源：评论

学校读者我要写书评

暂无评论

ViT-CoMer: vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions

ViT-CoMer: Vision Transformer with Convolutional Multi-scale...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xia, Chunlong Wang, Xinliang Lv, Feng Hao, Xin Shi, Yifeng Baidu Inc Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Although vision Transformer (ViT) has achieved significant success in computer vision, it does not perform well in dense prediction tasks due to the lack of inner-patch information interaction and the limited diversity of feature scale. Most existing studies are devoted to designing vision-specific transformers to solve the above problems, which introduce additional pre-training costs. Therefore, we present a plain, pre-training-free, and feature-enhanced ViT back-bone with Convolutional Multi-scale feature interaction, named ViT-CoMer, which facilitates bidirectional interaction between CNN and transformer. Compared to the state-of-the-art, ViT-CoMer has the following advantages: (1) We inject spatial pyramid multi-receptive field convolutional features into the ViT architecture, which effectively alleviates the problems of limited local information interaction and single-feature representation in ViT. (2) We propose a simple and efficient CNN-Transformer bidirectional fusion interaction module that performs multi-scale fusion across hierarchical features, which is beneficial for handling dense prediction tasks. (3) We evaluate the performance of ViT-CoMer across various dense prediction tasks, different frameworks, and multiple advanced pre-training. Notably, our ViT-CoMer-L achieves 64.3% AP on COCO val2017 without extra training data, and 62.1% mIoU on ADE20K val, both of which are comparable to state-of-the-art methods. We hope ViT-CoMer can serve as a new backbone for dense prediction tasks to facilitate future research. The code will be released at https://***/Traffic-X/ViT-CoMer.

关键词： DensePrediction visionFoundationBackbone visionTransformer

来源：评论

学校读者我要写书评

暂无评论

SonicvisionLM: Playing Sound with vision Language Models

SonicVisionLM: Playing Sound with Vision Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xie, Zhifeng Yu, Shengye He, Qile Li, Mengtian Shanghai Univ Shanghai Peoples R China Shanghai Engn Res Ctr Mot Picture Special Effects Shanghai Peoples R China

ISBN: (纸本)9798350353006

There has been a growing interest in the task of generating sound for silent videos, primarily because of its practicality in streamlining video post-production. However, existing methods for video-sound generation attempt to directly create sound from visual representations, which can be challenging due to the difficulty of aligning visual representations with audio representations. In this paper, we present SonicvisionLM, a novel framework aimed at generating a wide range of sound effects by leveraging vision-language models(VLMs). Instead of generating audio directly from video, we use the capabilities of powerful VLMs. When provided with a silent video, our approach first identifies events within the video using a VLM to suggest possible sounds that match the video content. This shift in approach transforms the challenging task of aligning image and audio into more well-studied sub-problems of aligning image-to-text and text-to-audio through the popular diffusion models. To improve the quality of audio recommendations with LLMs, we have collected an extensive dataset that maps text descriptions to specific sound effects and developed a time-controlled audio adapter. Our approach surpasses current state-of-the-art methods for converting video to audio, enhancing synchronization with the visuals, and improving alignment between audio and video components. Project page: https://***. io/***/

关键词： audio generation multi-modal vision-language model

来源：评论

学校读者我要写书评

暂无评论

Boosting Continual Learning of vision-Language Models via Mixture-of-Experts Adapters

Boosting Continual Learning of Vision-Language Models via Mi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yu, Jiazuo Zhuge, Yunzhi Zhang, Lu Hu, Ping Wang, Dong Lu, Huchuan He, You Dalian Univ Technol Dalian Peoples R China Univ Elect Sci & Technol China Chengdu Peoples R China Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798350353006

Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout life-long learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present a parameter-efficient continual learning framework to alleviate long-term forgetting in incremental learning with vision-language models. Our approach involves the dynamic expansion of a pre-trained CLIP model, through the integration of Mixture-of-Experts (MoE) adapters in response to new tasks. To preserve the zero-shot recognition capability of vision-language models, we further introduce a Distribution Discriminative Auto-Selector (DDAS) that automatically routes in-distribution and out-of-distribution inputs to the MoE Adapter and the original CLIP, respectively. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%. Our code locates at https://***/JiazuoYu/MoE-Adapters4CL

关键词： class incremental learning continual learning task incremental learning vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

YOLO-World: Real-Time Open-Vocabulary Object Detection

YOLO-World: Real-Time Open-Vocabulary Object Detection

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cheng, Tianheng Sone, Lin Ge, Yixiao Liu, Wenyu Wang, Xinggang Shan, Yong Tencent AI Lab Shenzhen Guangdong Peoples R China Tencent PCG ARC Lab Shenzhen Guangdong Peoples R China Huazhong Univ Sci & Technol Sch EIC Wuhan Hubei Peoples R China

ISBN: (纸本)9798350353006

The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling and pre-training on large-scale datasets. Specifically, we propose a new Re-parameterizable vision-Language Path Aggregation Network (RepVL-PAN) and region-text contrastive loss to facilitate the interaction between visual and linguistic information. Our method excels in detecting a wide range of objects in a zero-shot manner with high efficiency. On the challenging LVIS dataset, YOLO- World achieves 35.4 AP with 52.0 FPS on V100, which outperforms many state-of-the-art methods in terms of both accuracy and speed. Furthermore, the fine-tuned YOLO-World achieves remarkable performance on several downstream tasks, including object detection and open-vocabulary instance segmentation. Code and models are available at: https://***/AILab-CVC/YOLO-World.

关键词： Object detection open-vocabulary object detection real-time object detection vision-language modeling vision-language pre-training YOLO

来源：评论

学校读者我要写书评

暂无评论

RMT: Retentive Networks Meet vision Transformers

RMT: Retentive Networks Meet Vision Transformers

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Fan, Qihang Huang, Huaibo Chen, Mingrui Liu, Hongmin He, Ran Chinese Acad Sci Inst Automat MAIS & CRIPAC Beijing Peoples R China Univ Chinese Acad Sci Sch Artificial Intelligence Beijing Peoples R China Univ Sci & Technol Beijing Beijing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

vision Transformer (ViT) has gained increasing attention in the computer vision community in recent years. However, the core component of ViT, Self-Attention, lacks explicit spatial priors and bears a quadratic computational complexity, thereby constraining the applicability of ViT. To alleviate these issues, we draw inspiration from the recent Retentive Network (RetNet) in the field of NLP, and propose RMT, a strong vision backbone with explicit spatial prior for general purposes. Specifically, we extend the RetNet's temporal decay mechanism to the spatial domain, and propose a spatial decay matrix based on the Manhattan distance to introduce the explicit spatial prior to Self-Attention. Additionally, an attention decomposition form that adeptly adapts to explicit spatial prior is proposed, aiming to reduce the computational burden of modeling global information without disrupting the spatial decay matrix. Based on the spatial decay matrix and the attention decomposition form, we can flexibly integrate explicit spatial prior into the vision backbone with linear complexity. Extensive experiments demonstrate that RMT exhibits exceptional performance across various vision tasks. Specifically, without extra training data, RMT achieves 84.8% and 86.1% top-1 acc on ImageNet-1k with 27M/4.5GFLOPs and 96M/18.2GFLOPs. For downstream tasks, RMT achieves 54.5 box AP and 47.2 mask AP on the COCO detection task, and 52.8 mIoU on the ADE20K se-mantic segmentation task.

关键词： vision Transformer

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 23 24 25 26 27 28 29 30 31 32 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：