检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,860 篇 会议
105 篇 期刊文献
43 册 图书

馆藏范围

21,007 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,620 篇 工学
- 11,056 篇 计算机科学与技术...
- 2,652 篇 机械工程
- 2,252 篇 软件工程
- 914 篇 光学工程
- 885 篇 电气工程
- 529 篇 控制科学与工程
- 477 篇 信息与通信工程
- 216 篇 测绘科学与技术
- 135 篇 生物工程
- 127 篇 生物医学工程（可授...
- 98 篇 电子科学与技术（可...
- 92 篇 仪器科学与技术
- 46 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 化学工程与技术
- 39 篇 土木工程
- 37 篇 交通运输工程
- 35 篇 力学（可授工学、理...
- 33 篇 航空宇航科学与技...
3,494 篇 医学
- 3,489 篇 临床医学
- 32 篇 基础医学(可授医学...
2,247 篇 理学
- 1,145 篇 物理学
- 1,081 篇 数学
- 401 篇 生物学
- 384 篇 统计学（可授理学、...
- 245 篇 系统科学
- 46 篇 化学
343 篇 管理学
- 176 篇 管理科学与工程(可...
- 168 篇 图书情报与档案管...
- 34 篇 工商管理
31 篇 法学
19 篇 农学
15 篇 教育学
8 篇 经济学
5 篇 艺术学
2 篇 军事学
1 篇 文学

主题

8,141 篇 computer vision
2,886 篇 training
2,841 篇 pattern recognit...
1,809 篇 computational mo...
1,715 篇 visualization
1,493 篇 cameras
1,433 篇 three-dimensiona...
1,433 篇 feature extracti...
1,366 篇 shape
1,360 篇 face recognition
1,243 篇 image segmentati...
1,135 篇 robustness
1,124 篇 semantics
992 篇 computer archite...
985 篇 object detection
982 篇 layout
959 篇 benchmark testin...
935 篇 codes
900 篇 computer science
898 篇 object recogniti...

机构

174 篇 univ sci & techn...
158 篇 univ chinese aca...
153 篇 carnegie mellon ...
145 篇 chinese univ hon...
109 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
95 篇 tsinghua univers...
90 篇 microsoft res as...
90 篇 tsinghua univ pe...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
77 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
72 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 univ oxford oxfo...
65 篇 google res mount...

作者

80 篇 van gool luc
70 篇 zhang lei
58 篇 timofte radu
48 篇 yang yi
47 篇 luc van gool
46 篇 xiaoou tang
44 篇 tian qi
43 篇 darrell trevor
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
40 篇 li stan z.
38 篇 li fei-fei
37 篇 chen xilin
36 篇 shan shiguang
35 篇 zhou jie
35 篇 vasconcelos nuno
35 篇 liu yang
35 篇 torralba antonio
34 篇 liu xiaoming

语言

20,982 篇 英文
10 篇 中文
7 篇 其他
5 篇 土耳其文
2 篇 日文
2 篇 葡萄牙文

检索条件"任意字段=2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016"

共 21008 条记录，以下是351-360 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

TransNeXt: Robust Foveal Visual Perception for vision Transformers

TransNeXt: Robust Foveal Visual Perception for Vision Transf...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Shi, Dai

ISBN: (纸本)9798350353006

Due to the depth degradation effect in residual connections, many efficient vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and continuous eye movement while enabling each token on the feature map to have a global perception. Furthermore, we incorporate learnable tokens that interact with conventional queries and keys, which further diversifies the generation of affinity matrices beyond merely relying on the similarity between queries and keys. Our approach does not rely on stacking for information exchange, thus effectively avoiding depth degradation and achieving natural visual perception. Additionally, we propose Convolutional GLU, a channel mixer that bridges the gap between GLU and SE mechanism, which empowers each token to have channel attention based on its nearest neighbor image features, enhancing local modeling capability and model robustness. We combine aggregated attention and convolutional GLU to create a new visual backbone called TransNeXt. Extensive experiments demonstrate that our TransNeXt achieves state-of-the-art performance across multiple model sizes. At a resolution of 224(2), TransNeXt-Tiny attains an ImageNet accuracy of 84.0%, surpassing ConvNeXt-B with 69% fewer parameters. Our TransNeXt-Base achieves an ImageNet accuracy of 86.2% and an ImageNet-A accuracy of 61.6% at a resolution of 384(2), a COCO object detection mAP of 57.1, and an ADE20K semantic segmentation mIoU of 54.7.

关键词： Aggregated Attention Biomimetic vision Design Convolutional GLU Efficient Transformer Foveal Visual Perception Image Classification Image Segmentation ImageNet-1K ImageNet-Adversarial Large-Kernel Convolution Length-Scaled Cosine Attention Multi-Scale Extrapolation Object Detection Perceptual Artifacts Pixel-focused Attention Robustness Self-Attention vision Transformer Visual Backbone

来源：评论

学校读者我要写书评

暂无评论

Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action recognition

Align before Adapt: Leveraging Entity-to-Region Alignments f...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Yifei Chen, Dapeng Liu, Ruijin Zhou, Sai Xue, Wenyuan Peng, Wei Huawei Technol IT Innovat & Res Ctr Shenzhen Peoples R China

ISBN: (纸本)9798350353006

Large-scale visual-language pre-trained models have achieved significant success in various video tasks. However, most existing methods follow an adapt then align paradigm, which adapts pre-trained image encoders to model video-level representations and utilizes one-hot or text embedding of the action labels for supervision. This paradigm overlooks the challenge of mapping from static images to complicated activity concepts. In this paper, we propose a novel Align before Adapt (ALT) paradigm. Prior to adapting to video representation learning, we exploit the entity-to-region alignments for each frame. The alignments are fulfilled by matching the region-aware image embeddings to an offline-constructed text corpus. With the aligned entities, we feed their text embeddings to a transformer-based video adapter as the queries, which can help extract the semantics of the most important entities from a video to a vector. This paradigm reuses the visual-language alignment of VLP during adaptation and tries to explain an action by the underlying entities. This helps understand actions by bridging the gap with complex activity semantics, particularly when facing unfamiliar or unseen categories. ALT demonstrates competitive performance while maintaining remarkably low computational costs. In fully supervised experiments, it achieves 88.1 % top-1 accuracy on Kinetics-400 with only 4947 GFLOPs. Moreover, ALT outperforms the previous state-of-the-art methods in both zero-shot and fewshot experiments, emphasizing its superior generalizability across various learning scenarios.

关键词： Video action recognition visual-language model

来源：评论

学校读者我要写书评

暂无评论

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual I...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yu, Qifan Li, Juncheng Wei, Longhui Pang, Liang Ye, Wentao Qin, Bosheng Tang, Siliang Tian, Qi Zhuang, Yueting Zhejiang Univ Hangzhou Peoples R China Huawei Cloud Suzhou Peoples R China Chinese Acad Sci Inst Comp Technol Beijing Peoples R China

ISBN: (纸本)9798350353006

Multi-modal Large Language Models (MLLMs) tuned on machine-generated instruction-following data have demonstrated remarkable performance in various multi-modal understanding and generation tasks. However, the hallucinations inherent in machine-generated data, which could lead to hallucinatory outputs in MLLMs, remain under-explored. This work aims to investigate various hallucinations (i.e., object, relation, attribute hallucinations) and mitigate those hallucinatory toxicities in large-scale machine-generated visual instruction datasets. Drawing on the human ability to identify factual errors, we present a novel hallucination detection and elimination framework, HalluciDoctor, based on the cross-checking paradigm. We use our framework to identify and eliminate hallucinations in the training data automatically. Interestingly, HalluciDoctor also indicates that spurious correlations arising from long-tail object cooccurrences contribute to hallucinations. Based on that, we execute counterfactual visual instruction expansion to balance data distribution, thereby enhancing MLLMs' resistance to hallucinations. Comprehensive experiments on hallucination evaluation benchmarks show that our method successfully mitigates 44.6% hallucinations relatively and maintains competitive performance compared to LLaVA. The data and code for this paper are publicly available.(1)

关键词： Hallucinations Multi-modal Language Model vision-language reasoning

来源：评论

学校读者我要写书评

暂无评论

Continual Forgetting for Pre-trained vision Models

Continual Forgetting for Pre-trained Vision Models

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhao, Hongbo Ni, Bolin Fang, Junsong Wang, Yuxi Chen, Yuntao Meng, Gaofeng Zhang, Zhaoxiang Chinese Acad Sci Inst Automat State Key Lab Multimodal Artificial Intelligence Beijing Peoples R China Chinese Acad Sci Ctr Artificial Intelligence & Robot Hong Kong Inst Sci & Innovat Beijing Peoples R China Univ Chinese Acad Sci Beijing Peoples R China Shanghai Artificial Intelligence Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

For privacy and security concerns, the need to erase unwanted information from pre-trained vision models is becoming evident nowadays. In real-world scenarios, erasure requests originate at any time from both users and model owners. These requests usually form a sequence. Therefore, under such a setting, selective information is expected to be continuously removed from a pre-trained model while maintaining the rest. We define this problem as continual forgetting and identify two key challenges. (i) For unwanted knowledge, efficient and effective deleting is crucial. (ii) For remaining knowledge, the impact brought by the forgetting procedure should be minimal. To address them, we propose Group Sparse LoRA (GS-LoRA). Specifically, towards (i), we use LoRA modules to fine-tune the FFN layers in Transformer blocks for each forgetting task independently, and towards (ii), a simple group sparse regularization is adopted, enabling automatic selection of specific LoRA groups and zeroing out the others. GS-LoRA is effective, parameter-efficient, data-efficient, and easy to implement. We conduct extensive experiments on face recognition, object detection and image classification and demonstrate that GS-LoRA manages to forget specific classes with minimal impact on other classes. Codes will be released on https://***/bjzhb666/GS-LoRA.

关键词： Continual Forgetting Machine Unlearning

来源：评论

学校读者我要写书评

暂无评论

On Scaling up a Multilingual vision and Language Model

On Scaling up a Multilingual Vision and Language Model

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Xi Djolonga, Josip Padlewski, Piotr Mustafa, Basil Changpinyo, Soravit Wu, Jialin Ruiz, Carlos Riquelme Goodman, Sebastian Wang, Xiao Tay, Yi Shakeri, Siamak Dehghani, Mostafa Salz, Daniel Lucic, Mario Tschannen, Michael Nagrani, Arsha Hu, Hexiang Joshi, Mandar Pang, Bo Montgomery, Ceslee Pietrzyk, Paulina Ritter, Marvin Piergiovanni, A. J. Minderer, Matthias Pavetic, Filip Waters, Austin Li, Gang Alabdulmohsin, Ibrahim Beyer, Lucas Amelot, Julien Lee, Kenton Steiner, Andreas Peter Li, Yang Keysers, Daniel Arnab, Anurag Xu, Yuanzhong Rong, Keran Kolesnikov, Alexander Seyedhosseini, Mojtaba Angelova, Anelia Zhai, Xiaohua Houlsby, Neil Soricut, Radu Google Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

We explore the boundaries of scaling up a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. Our model advances the state-of-the-art on most vision-and-language benchmarks considered (20+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.

关键词： language multimodal pretraining vision

来源：评论

学校读者我要写书评

暂无评论

Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction recognition

Learning from Observer Gaze: Zero-Shot Attention Prediction ...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhou, Yuchen Liu, Linkai Gou, Chao Sun Yat Sen Univ Guangzhou Peoples R China

ISBN: (纸本)9798350353006

Most existing attention prediction research focuses on salient instances like humans and objects. However, the more complex interaction-oriented attention, arising from the comprehension of interactions between instances by human observers, remains largely unexplored. This is equally crucial for advancing human-machine interaction and human-centered artificial intelligence. To bridge this gap, we first collect a novel gaze fixation dataset named IG, comprising 530,000 fixation points across 740 diverse interaction categories, capturing visual attention during human observers' cognitive processes of interactions. Subsequently, we introduce the zero-shot interaction-oriented attention prediction task (ZeroIA), which challenges models to predict visual cues for interactions not encountered during training. Thirdly, we present the Interactive Attention model (IA), designed to emulate human observers' cognitive processes to tackle the ZeroIA problem. Extensive experiments demonstrate that the proposed IA outperforms other state-of-the-art approaches in both ZeroIA and fully supervised settings. Lastly, we endeavor to apply interaction-oriented attention to the interaction recognition task itself. Further experimental results demonstrate the promising potential to enhance the performance and interpretability of existing state-of-the-art HOI models by incorporating real human attention data from IG and attention labels generated by IA.

关键词： Action understanding Attention prediction Gaze Human-Object Interaction Detection Saliency Visual Attention Zero-shot Learning

来源：评论

学校读者我要写书评

暂无评论

Exploring and Utilizing pattern Imbalance

Exploring and Utilizing Pattern Imbalance

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Mei, Shibin Zhao, Chenglong Yuan, Shengchao Ni, Bingbing Shanghai Jiao Tong Univ Shanghai 200240 Peoples R China

ISBN: (纸本)9798350301298

In this paper, we identify pattern imbalance from several aspects, and further develop a new training scheme to avert pattern preference as well as spurious correlation. In contrast to prior methods which are mostly concerned with category or domain granularity, ignoring the potential finer structure that existed in datasets, we give a new definition of seed category as an appropriate optimization unit to distinguish different patterns in the same category or domain. Extensive experiments on domain generalization datasets of diverse scales demonstrate the effectiveness of the proposed method.

关键词： Datasets and evaluation

来源：评论

学校读者我要写书评

暂无评论

Pre-training vision Models with Mandelbulb Variations

Pre-training Vision Models with Mandelbulb Variations

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chiche, Benjamin Naoto Horikawa, Yuto Fujita, Ryo Rist Inc 830 Hongakujimae-ChoGojo-DoriShimogyo-Ku Kyoto 6008102 Japan

ISBN: (纸本)9798350353006

The use of models that have been pre-trained on natural image datasets like ImageNet may face some limitations. First, this use may be restricted due to copyright and license on the training images, and privacy laws. Second, these datasets and models may incorporate societal and ethical biases. Formula-driven supervised learning (FDSL) enables model pre-training to circumvent these issues. This consists of generating a synthetic image dataset based on mathematical formulae and pre-training the model on it. In this work, we propose novel FDSL datasets based on Mandelbulb Variations. These datasets contain RGB images that are projections of colored objects deriving from the 3D Mandelbulb fractal. Pre-training ResNet-50 on one of our proposed datasets MandelbulbVAR-1k enables an average top-1 accuracy over target classification datasets that is at least 1% higher than pre-training on existing FDSL datasets. With regard to anomaly detection on MVTec AD, pre-training the WideResNet-50 backbone on MandelbulbVAR-1k enables PatchCore to achieve 97.2% average image-level AUROC. This is only 1.9% lower than pre-training on ImageNet-1k (99.1%) and 4.5% higher than pre-training on the second-best performing FDSL dataset i.e. VisualAtom-1k (92.7%). Regarding vision Transformer (ViT) pre-training, another dataset that we propose and coin MandelbulbVAR-Hybrid-21k enables ViT-Base to achieve 82.2% top-1 accuracy on ImageNet-1k, which is 0.4% higher than pre-training on ImageNet-21k (81.8%) and only 0.1% lower than pre-training on VisualAtom-1k (82.3%).

关键词： anomaly detection classification convolutional neural network Formula-driven supervised learning fractal mandelbulb pre-training vision transformer

来源：评论

学校读者我要写书评

暂无评论

Adapting Short-Term Transformers for Action Detection in Untrimmed Videos

Adapting Short-Term Transformers for Action Detection in Unt...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yang, Min Gao, Huan Guo, Ping Wang, Limin Nanjing Univ State Key Lab Novel Software Technol Nanjing Peoples R China Inchitech Beijing Peoples R China Intel Labs China Hillsboro OR USA Shanghai AI Lab Shanghai Peoples R China

ISBN: (纸本)9798350353006

vision Transformer (ViT) has shown high potential in video recognition, owing to its flexible design, adaptable self-attention mechanisms, and the efficacy of masked pretraining. Yet, it remains unclear how to adapt these pretrained short-term ViTs for temporal action detection (TAD) in untrimmed videos. The existing works treat them as off-the-shelf feature extractors for each short-trimmed snippet without capturing the fine-grained relation among different snippets in a broader temporal context. To mitigate this issue, this paper focuses on designing a new mechanism for adapting these pre-trained ViT models as a unified long-form video transformer to fully unleash its modeling power in capturing inter-snippet relation, while still keeping low computation overhead and memory consumption for efficient TAD. To this end, we design effective crosssnippet propagation modules to gradually exchange short-term video information among different snippets from two levels. For inner-backbone information propagation, we introduce a cross-snippet propagation strategy to enable multi-snippet temporal feature interaction inside the backbone. For post-backbone information propagation, we propose temporal transformer layers for further clip-level modeling. With the plain ViT-B pre-trained with VideoMAE, our end-to-end temporal action detector (ViT-TAD) yields a very competitive performance to previous temporal action detectors, riching up to 69.5 average mAP on THUMOS14, 37.40 average mAP on ActivityNet-1.3 and 17.20 average mAP on FineAction.

关键词： temporal action detection vision Transformer

来源：评论

学校读者我要写书评

暂无评论

EvDiG: Event-guided Direct and Global Components Separation

EvDiG: Event-guided Direct and Global Components Separation

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhou, Xinyu Duan, Peiqi Li, Boyu Zhou, Chu Xu, Chao Shi, Boxin Peking Univ Sch Intelligence Sci & Technol Natl Key Lab Gen AI Beijing Peoples R China Peking Univ Sch Comp Sci Natl Key Lab Multimedia Informat Proc Beijing Peoples R China Peking Univ Sch Comp Sci Natl Engn Res Ctr Visual Technol Beijing Peoples R China

ISBN: (纸本)9798350353006

Separating the direct and global components of a scene aids in shape recovery and basic material understanding. Conventional methods capture multiple frames under high frequency illumination patterns or shadows, requiring the scene to keep stationary during the image acquisition process. Single-frame methods simplify the capture procedure but yield lower-quality separation results. In this paper, we leverage the event camera to facilitate the separation of direct and global components, enabling video-rate separation of high quality. In detail, we adopt an event camera to record rapid illumination changes caused by the shadow of a line occluder sweeping over the scene, and reconstruct the coarse separation results through event accumulation. We then design a network to resolve the noise in the coarse separation results and restore color information. A real-world dataset is collected using a hybrid camera system for network training and evaluation. Experimental results show superior performance over state-of-the-art methods.

关键词： direct-global separation event-guided vision

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 32 33 34 35 36 37 38 39 40 41 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：