检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,886 篇 会议
5 篇 期刊文献

馆藏范围

11,891 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,060 篇 工学
- 7,618 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 361 篇 软件工程
- 228 篇 控制科学与工程
- 41 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 7 篇 交通运输工程
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,347 篇 医学
- 3,346 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
254 篇 理学
- 198 篇 系统科学
- 32 篇 物理学
- 21 篇 生物学
- 19 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
892 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,849 篇 英文
41 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11891 条记录，以下是1231-1240 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

MSCC: Multi-Scale Transformers for Camera Calibration

MSCC: Multi-Scale Transformers for Camera Calibration

引用

ieee/cvf Winter conference on Applications of computer vision (WACV)

作者： Song, Xu Kang, Hao Moteki, Atsunori Suzuki, Genta Kobayashi, Yoshie Tan, Zhiming Fujitsu R&D Ctr Co Ltd Beijing Peoples R China Fujitsu Ltd Tokyo Japan

ISBN: (纸本)9798350318920;9798350318937

Camera calibration is very important for some vision tasks, like rendering 3D scenes, environment reconstruction, and self-localization, etc. In this paper, we propose a framework of multi-scale transformers for camera calibration. With the input of a single image, the multi-scale features output from the model's backbone are utilized to estimate camera parameters. At the same time, we show that the way of coarse-to-fine is effective to locate global structures and detailed features in the image, by studying the attention response of horizon line estimation. Moreover, deep supervision is proven to get more precise results and accelerated training. Our method outperforms all the state-of-the-art methods by objective and subjective experiments on Google Street View dataset and Pano360.

关键词： 3D computer vision Algorithms Algorithms Image recognition and understanding

来源：评论

学校读者我要写书评

暂无评论

Dynamic Generative Targeted Attacks with pattern Injection

Dynamic Generative Targeted Attacks with Pattern Injection

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Feng, Weiwei Xu, Nanqing Zhang, Tianzhu Zhang, Yongdong Univ Sci & Technol China Hefei Peoples R China Deep Space Explorat Lab Hefei Peoples R China

ISBN: (纸本)9798350301298

Adversarial attacks can evaluate model robustness and have been of great concern in recent years. Among various attacks, targeted attacks aim at misleading victim models to output adversary-desired predictions, which are more challenging and threatening than untargeted ones. Existing targeted attacks can be roughly divided into instance-specific and instance-agnostic attacks. Instance-specific attacks craft adversarial examples via iterative gradient updating on the specific instance. In contrast, instance-agnostic attacks learn a universal perturbation or a generative model on the global dataset to perform attacks. However, they rely too much on the classification boundary of substitute models, ignoring the realistic distribution of the target class, which may result in limited targeted attack performance. And there is no attempt to simultaneously combine the information of the specific instance and the global dataset. To deal with these limitations, we first conduct an analysis via a causal graph and propose to craft transferable targeted adversarial examples by injecting target patterns. Based on this analysis, we introduce a generative attack model composed of a cross-attention guided convolution module and a pattern injection module. Concretely, the former adopts a dynamic convolution kernel and a static convolution kernel for the specific instance and the global dataset, respectively, which can inherit the advantages of both instance-specific and instance-agnostic attacks. And the pattern injection module utilizes a pattern prototype to encode target patterns, which can guide the generation of targeted adversarial examples. Besides, we also provide rigorous theoretical analysis to guarantee the effectiveness of our method. Extensive experiments demonstrate that our method shows superior performance than 10 existing adversarial attacks against 13 models.

关键词： Adversarial attack and defense

来源：评论

学校读者我要写书评

暂无评论

PaCa-ViT: Learning Patch-to-Cluster Attention in vision Transformers

PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Tran...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Grainger, Ryan Paniagua, Thomas Song, Xi Cuntoor, Naresh Lee, Mun Wai Wu, Tianfu NC State Dept ECE Raleigh NC 27695 USA BlueHalo Arlington VA USA

ISBN: (纸本)9798350301298

vision Transformers (ViTs) are built on the assumption of treating image patches as "visual tokens" and learn patch-to-patch attention. The patch embedding based tokenizer has a semantic gap with respect to its counterpart, the textual tokenizer. The patch-to-patch attention suffers from the quadratic complexity issue, and also makes it nontrivial to explain learned ViTs. To address these issues in ViT, this paper proposes to learn Patch-to-Cluster attention (PaCa) in ViT. Queries in our PaCa-ViT starts with patches, while keys and values are directly based on clustering (with a predefined small number of clusters). The clusters are learned end-to-end, leading to better tokenizers and inducing joint clustering-for-attention and attention-for-clustering for better and interpretable models. The quadratic complexity is relaxed to linear complexity. The proposed PaCa module is used in designing efficient and interpretable ViT backbones and semantic segmentation head networks. In experiments, the proposed methods are tested on ImageNet-1k image classification, MS-COCO object detection and instance segmentation and MIT-ADE20k semantic segmentation. Compared with the prior art, it obtains better performance in all the three benchmarks than the SWin [32] and the PVTs [47, 48] by significant margins in ImageNet-1k and MIT-ADE20k. It is also significantly more efficient than PVT models in MS-COCO and MIT-ADE20k due to the linear complexity. The learned clusters are semantically meaningful. Code and model checkpoints are available at https://***/iVMCL/PaCaViT.

关键词： Deep learning architectures and techniques

来源：评论

学校读者我要写书评

暂无评论

PHA: Patch-wise High-frequency Augmentation for Transformer-based Person Re-identification

PHA: Patch-wise High-frequency Augmentation for Transformer-...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Guiwei Zhang, Yongfei Zhang, Tianyu Li, Bo Pu, Shiliang Beihang Univ Beijing Key Lab Digital Media Sch Comp Sci & Engn Beijing Peoples R China Beihang Univ State Key Lab Virtual Real Technol & Syst Beijing Peoples R China Pengcheng Lab Shenzhen Peoples R China Hikvis Res Inst Hangzhou Peoples R China

ISBN: (纸本)9798350301298

Although recent studies empirically show that injecting Convolutional Neural Networks (CNNs) into vision Transformers (ViTs) can improve the performance of person re-identification, the rationale behind it remains elusive. From a frequency perspective, we reveal that ViTs perform worse than CNNs in preserving key high-frequency components (e.g, clothes texture details) since high-frequency components are inevitably diluted by low-frequency ones due to the intrinsic Self-Attention within ViTs. To remedy such inadequacy of the ViT, we propose a Patch-wise High-frequency Augmentation (PHA) method with two core designs. First, to enhance the feature representation ability of high-frequency components, we split patches with high-frequency components by the Discrete Haar Wavelet Transform, then empower the ViT to take the split patches as auxiliary input. Second, to prevent high-frequency components from being diluted by low-frequency ones when taking the entire sequence as input during network optimization, we propose a novel patch-wise contrastive loss. From the view of gradient optimization, it acts as an implicit augmentation to improve the representation ability of key high-frequency components. This benefits the ViT to capture key high-frequency components to extract discriminative person representations. PHA is necessary during training and can be removed during inference, without bringing extra complexity. Extensive experiments on widely-used ReID datasets validate the effectiveness of our method.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification

Bridging the Gap between Model Explanations in Partially Ann...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Kim, Youngwook Kim, Jae Myung Jeong, Jieun Schmid, Cordelia Akata, Zeynep Lee, Jungwoo Seoul Natl Univ Seoul South Korea Univ Tubingen Tubingen Germany HodooAI Lab Ho Chi Minh City Vietnam PSL Res Univ CNRS Ecole Normale Super Inria Paris France MPI Intelligent Syst Stuttgart Germany

ISBN: (纸本)9798350301298

Due to the expensive costs of collecting labels in multi-label classification datasets, partially annotated multi-label classification has become an emerging field in computer vision. One baseline approach to this task is to assume unobserved labels as negative labels, but this assumption induces label noise as a form of false negative. To understand the negative impact caused by false negative labels, we study how these labels affect the model's explanation. We observe that the explanation of two models, trained with full and partial labels each, highlights similar regions but with different scaling, where the latter tends to have lower attribution scores. Based on these findings, we propose to boost the attribution scores of the model trained with partial labels to make its explanation resemble that of the model trained with full labels. Even with the conceptually simple approach, the multi-label classification performance improves by a large margin in three different datasets on a single positive label setting and one on a large-scale partial label setting. Code is available at https:// ***/youngwk/BridgeGapExplanationPAMC.

关键词： detection recognition: Categorization retrieval

来源：评论

学校读者我要写书评

暂无评论

CREPE: Can vision-Language Foundation Models Reason Compositionally?

CREPE: Can Vision-Language Foundation Models Reason Composit...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Ma, Zixian Hong, Jerry Gul, Mustafa Omer Ciandhi, Mona Geo, Irena krishna, Ranjay Stanford Univ Stanford CA 94305 USA Cornell Univ Ithaca NY USA Univ Penn Philadelphia PA USA Univ Washington Seattle WA USA

ISBN: (纸本)9798350301298

A fundamental characteristic common to both human vision and natural language is their compositional nature. Yet, despite the performance gains contributed by large vision and language pretraining, we find that-across 7 architectures trained with 4 algorithms on massive datasets-they struggle at compositionality. To arrive at this conclusion, we introduce a new compositionality evaluation benchmark, CREPE, which measures two important aspects of compositionality identified by cognitive science literature: systematicity and productivity. To measure systematicity, CREPE consists of a test dataset containing over 370K image-text pairs and three different seen-unseen splits. The three splits are designed to test models trained on three popular training datasets: CC-12M, YFCC-15M, and LAION-400M. We also generate 325K, 316K, and 309K hard negative captions for a subset of the pairs. To test productivity, CREPE contains 17K image-text pairs with nine different complexities plus 278K hard negative captions with atomic, swapping and negation foils. The datasets are generated by repurposing the Visual Genome scene graphs and region descriptions and applying handcrafted templates and GPT-3. For systematicity, we find that model performance decreases consistently when novel compositions dominate the retrieval set, with Recall@1 dropping by up to 9%. For productivity, models' retrieval success decays as complexity increases, frequently nearing random chance at high complexity. These results hold regardless of model and training dataset size.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images

Sparse Multi-Modal Graph Transformer with Shared-Context Pro...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Nakhli, Ramin Moghadam, Puria Azadi Mi, Haoyang Farahani, Hossein Baras, Alexander Gilks, Blake Bashashati, Ali Univ British Columbia Vancouver BC Canada Johns Hopkins Univ Baltimore MD USA

ISBN: (纸本)9798350301298

Processing giga-pixel whole slide histopathology images (WSI) is a computationally expensive task. Multiple instance learning (MIL) has become the conventional approach to process WSIs, in which these images are split into smaller patches for further processing. However, MIL-based techniques ignore explicit information about the individual cells within a patch. In this paper, by defining the novel concept of shared-context processing, we designed a multi-modal Graph Transformer (AMIGO) that uses the cellular graph within the tissue to provide a single representation for a patient while taking advantage of the hierarchical structure of the tissue, enabling a dynamic focus between cell-level and tissue-level information. We benchmarked the performance of our model against multiple state-of-the-art methods in survival prediction and showed that ours can significantly outperform all of them including hierarchical vision Transformer (ViT). More importantly, we show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data. Finally, in two different cancer datasets, we demonstrated that our model was able to stratify the patients into low-risk and high-risk groups while other state-of-the-art methods failed to achieve this goal. We also publish a large dataset of immunohistochemistry images (InUIT) containing 1,600 tissue microarray (TMA) cores from 188 patients along with their survival information, making it one of the largest publicly available datasets in this context.

关键词： cell microscopy Medical and biological vision

来源：评论

学校读者我要写书评

暂无评论

Contrastive Learning for Multi-Object Tracking with Transformers

Contrastive Learning for Multi-Object Tracking with Transfor...

引用

ieee/cvf Winter conference on Applications of computer vision (WACV)

作者： De Plaen, Pierre-Francois Marinello, Nicola Proesmans, Marc Tuytelaars, Tinne Van Gool, Luc Katholieke Univ Leuven ESAT PSI Leuven Belgium Swiss Fed Inst Technol CVL Zurich Switzerland TRACE Vzw Leuven Belgium

ISBN: (纸本)9798350318920;9798350318937

The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-level contrastive loss, a revised sampling strategy and a lightweight assignment method. Our training scheme learns object appearances while preserving detection capabilities and with little overhead. Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset and is comparable to existing transformer-based methods on the MOT17 dataset.

关键词： Algorithms Algorithms Applications Autonomous Driving Image recognition and understanding Video recognition and understanding

来源：评论

学校读者我要写书评

暂无评论

3Mformer: Multi-order Multi-mode Transformer for Skeletal Action recognition

3Mformer: Multi-order Multi-mode Transformer for Skeletal Ac...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Wang, Lei Koniusz, Piotr Australian Natl Univ Canberra ACT Australia Data61 CSIRO Eveleigh Australia

ISBN: (纸本)9798350301298

Many skeletal action recognition models use GCNs to represent the human body by 3D body joints connected body parts. GCNs aggregate one- or few-hop graph neighbourhoods, and ignore the dependency between not linked body joints. We propose to form hypergraph to model hyperedges between graph nodes (e.g., third- and fourth-order hyper-edges capture three and four nodes) which help capture higher-order motion patterns of groups of body joints. We split action sequences into temporal blocks, Higherorder Transformer (HoT) produces embeddings of each temporal block based on (i) the body joints, (ii) pairwise links of body joints and (iii) higher-order hyper-edges of skeleton body joints. We combine such HoT embeddings of hyper-edges of orders 1,..., r by a novel Multi-order Multi-mode Transformer (3Mformer) with two modules whose order can be exchanged to achieve coupled-mode attention on coupled-mode tokens based on 'channel-temporal block', 'order-channel-body joint', 'channel-hyper-edge (any order)' and 'channel-only' pairs. The first module, called Multi-order Pooling (MP), additionally learns weighted aggregation along the hyper-edge mode, whereas the second module, Temporal block Pooling (TP), aggregates along the temporal block1 mode. Our end-to-end trainable network yields state-of-the-art results compared to GCN-, transformer- and hypergraph-based counterparts.

关键词： Video: Action and event understanding

来源：评论

学校读者我要写书评

暂无评论

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

Towards Generalisable Video Moment Retrieval: Visual-Dynamic...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Luo, Dezhao Huang, Jiabo Gong, Shaogang Jin, Hailin Liu, Yang Queen Mary Univ London London England Adobe Res San Francisco CA USA Peking Univ WICT Beijing Peoples R China

ISBN: (纸本)9798350301298

The correlation between the vision and text is essential for video moment retrieval (VMR), however, existing methods heavily rely on separate pre-training feature extractors for visual and textual understanding. Without sufficient temporal boundary annotations, it is non-trivial to learn universal video-text alignments. In this work, we explore multi-modal correlations derived from large-scale image-text data to facilitate generalisable VMR. To address the limitations of image-text pre-training models on capturing the video changes, we propose a generic method, referred to as Visual-Dynamic Injection (VDI), to empower the model's understanding of video moments. Whilst existing VMR methods are focusing on building temporal-aware video features, being aware of the text descriptions about the temporal changes is also critical but originally overlooked in pre-training by matching static images with sentences. Therefore, we extract visual context and spatial dynamic information from video frames and explicitly enforce their alignments with the phrases describing video changes (e.g. verb). By doing so, the potentially relevant visual and motion patterns in videos are encoded in the corresponding text embeddings (injected) so to enable more accurate video-text alignments. We conduct extensive experiments on two VMR benchmark datasets (Charades-STA and ActivityNet-Captions) and achieve state-of-the-art performances. Especially, VDI yields notable advantages when being tested on the out-of-distribution splits where the testing samples involve novel scenes and vocabulary.

关键词： and reasoning language vision

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 120 121 122 123 124 125 126 127 128 129 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：