检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是271-280 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

PairDETR : Joint Detection and Association of Human Bodies and Faces

PairDETR : Joint Detection and Association of Human Bodies a...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ali, Ammar Gaikov, Georgii Rybalchenko, Denis Chigorin, Alexander Laptev, Ivan Zagoruyko, Sergey MTS AI ITMO Moscow Russia MTS AI Moscow Russia VisionLabs Hyderabad Telangana India MBZUAI Abu Dhabi U Arab Emirates MTS AI Skoltech Moscow Russia

ISBN: (纸本)9798350353013;9798350353006

Image and video analysis requires not only accurate object detection but also the understanding of relationships among detected objects. Common solutions to relation modeling typically resort to stand-alone object detectors followed by non-differentiable post-processing techniques. Recently introduced detection transformers (DETR) perform end-to-end object detection based on a bipartite matching loss. Such methods, however, lack the ability to jointly detect objects and resolve object associations. In this paper, we build on the DETR approach and extend it to the joint detection of objects and their relationships by introducing an approximated bipartite matching. While our method can generalize to an arbitrary number of objects, we here focus on the modeling of object pairs and their relations. In particular, we apply our method PairDETR to the problem of detecting human bodies and faces, and associating them for the same person. Our approach not only eliminates the need for hand-designed post-processing but also achieves excellent results for body-face associations. We evaluate PairDETR on the challenging CrowdHuman and CityPersons datasets and demonstrate a large improvement over the state of the art. Our training code and pre-trained models are available at https://***/mts-ai/pairdetr

关键词： Association CityPersons computer vision CrowdHuman DETR end-to-end Object detection Transformers

来源：评论

学校读者我要写书评

暂无评论

Domain Prompt Learning with Quaternion Networks

Domain Prompt Learning with Quaternion Networks

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cao, Qinglong Xu, Zhengqin Chen, Yuntian Ma, Chao Yang, Xiaokang Shanghai Jiao Tong Univ AI Inst MoE Key Lab Artificial Intelligence Shanghai Peoples R China Eastern Inst Technol Ningbo Inst Digital Twin Ningbo Peoples R China

ISBN: (纸本)9798350353006

Prompt learning has emerged as a potent and resource-efficient technique in large vision-Language Models (VLMs). However, its application in adapting VLMs to specialized domains like remote sensing and medical imaging, termed domain prompt learning, remains relatively unexplored. Although large-scale domain-specific foundation models offer a potential solution, their focus on a singular vision level presents challenges in prompting both vision and language modalities. To address this limitation, we propose leveraging domain-specific knowledge from these foundation models to transfer the robust recognition abilities of VLMs from generalized to specialized domains, employing quaternion networks. Our method entails utilizing domain-specific vision features from domain-specific foundation models to guide the transformation of generalized contextual embeddings from the language branch into a specialized space within quaternion networks. Furthermore, we introduce a hierarchical approach that derives vision prompt features by analyzing intermodal relationships between hierarchical language prompt features and domain-specific vision features. Through this mechanism, quaternion networks can effectively explore intermodal relationships in specific domains, facilitating domain-specific vision-language contrastive learning. Extensive experiments conducted on domain-specific datasets demonstrate that our proposed method achieves new state-of-the-art results in prompt learning. Codes are available at https://***/caoql98/DPLQ.

关键词： domain prompt learning quaternion networks vision-language models

来源：评论

学校读者我要写书评

暂无评论

WALT3D: Generating Realistic Training Data from Time-Lapse Imagery for Reconstructing Dynamic Objects under Occlusion

WALT3D: Generating Realistic Training Data from Time-Lapse I...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Khiem Vuong Reddy, N. Dinesh Tamburo, Robert Narasimhan, Srinivasa G. Carnegie Mellon Univ Pittsburgh PA 15213 USA Amazon Seattle WA USA

ISBN: (纸本)9798350353006

Current methods for 2D and 3D object understanding struggle with severe occlusions in busy urban environments, partly due to the lack of large-scale labeled groundtruth annotations for learning occlusion. In this work, we introduce a novel framework for automatically generating a large, realistic dataset of dynamic objects under occlusions using freely available time-lapse imagery. By leveraging off-the-shelf 2D (bounding box, segmentation, keypoint) and 3D (pose, shape) predictions as pseudo-groundtruth, unoccluded 3D objects are identified automatically and composited into the background in a clip-art style, ensuring realistic appearances and physically accurate occlusion configurations. The resulting clip-art image with pseudo-groundtruth enables efficient training of object reconstruction methods that are robust to occlusions. Our method demonstrates significant improvements in both 2D and 3D reconstruction, particularly in scenarios with heavily occluded objects like vehicles and people in urban scenes.

关键词： 3D from single images computer vision

来源：评论

学校读者我要写书评

暂无评论

Any-Shift Prompting for Generalization over Distributions

Any-Shift Prompting for Generalization over Distributions

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xiao, Zehao Shen, Jiayi Derakhshani, Mohammad Mandi Liao, Shengcai Snoek, Cees G. M. Univ Amsterdam Amsterdam Netherlands Core42 Abu Dhabi U Arab Emirates

ISBN: (纸本)9798350353006

Image-language models with prompt learning have shown remarkable advances in numerous downstream vision tasks. Nevertheless, conventional prompt learning methods overfit their training distribution and lose the generalization ability on test distributions. To improve generalization across various distribution shifts, we propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning. We explicitly connect training and test distributions in the latent space by constructing training and test prompts in a hierarchical architecture. Within this frame-work, the test prompt exploits the distribution relationships to guide the generalization of the CLIP image-language model from training to any test distribution. To effectively encode the distribution information and their relationships, we further introduce a transformer inference network with a pseudo-shift training mechanism. The network generates the tailored test prompt with both training and test information in a feedforward pass, avoiding extra training costs at test time. Extensive experiments on twenty-three datasets demonstrate the effectiveness of any-shift prompting on the generalization over various distribution shifts.

关键词： distribution shifts out-of-distribution generalization prompt learning vision-language

来源：评论

学校读者我要写书评

暂无评论

Perturbing Attention Gives You More Bang for the Buck: Subtle Imaging Perturbations That Efficiently Fool Customized Diffusion Models

Perturbing Attention Gives You More Bang for the Buck: Subtl...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xu, Jingyao Lu, Yuetong Li, Yandong Lu, Siyang Wang, Dongdong Wei, Xiang Beijing Jiaotong Univ Beijing Peoples R China Google Res Mountain View CA USA Univ Cent Florida Orlando FL 32816 USA

ISBN: (纸本)9798350353006

Diffusion models ( DMs) embark a new era of generative modeling and offer more opportunities for efficient generating high- quality and realistic data samples. However, their widespread use has also brought forth new challenges in model security, which motivates the creation of more effective adversarial attackers on DMs to understand its vulnerability. We propose CAAT, a simple but generic and efficient approach that does not require costly training to effectively fool latent diffusion models (LDMs). The approach is based on the observation that cross-attention layers exhibits higher sensitivity to gradient change, allowing for leveraging subtle perturbations on published images to significantly corrupt the generated images. We show that a subtle perturbation on an image can significantly impact the cross-attention layers, thus changing the mapping between text and image during the fine-tuning of customized diffusion models. Extensive experiments demonstrate that CAAT is compatible with diverse diffusion models and out- performs baseline attack methods in a more effective (more noise) and efficient (twice as fast as Anti-DreamBooth and Mist) manner.

关键词： Adversarial Attack computer vision

来源：评论

学校读者我要写书评

暂无评论

Instance-Aware Group Quantization for vision Transformers

Instance-Aware Group Quantization for Vision Transformers

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Jaehyeon, Moon Kim, Dohyung Cheon, Junyong Ham, Bumsub Yonsei Univ Seoul South Korea Articron Bogota Colombia

ISBN: (纸本)9798350353006

Post-training quantization (PTQ) is an efficient model compression technique that quantizes a pretrained full-precision model using only a small calibration set of unlabeled samples without retraining. PTQ methods for convolutional neural networks (CNNs) provide quantization results comparable to full-precision counterparts. Directly applying them to vision transformers (ViTs), however, incurs severe performance degradation, mainly due to the differences in architectures between CNNs and ViTs. In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs. To address this, we introduce instance-aware group quantization for ViTs (IGQ-ViT). To this end, we propose to split the channels of activation maps into multiple groups dynamically for each input instance, such that activations within each group share similar statistical properties. We also extend our scheme to quantize softmax attentions across tokens. In addition, the number of groups for each layer is adjusted to minimize the discrepancies between predictions from quantized and full-precision models, under a bit-operation (BOP) constraint. We show extensive experimental results on image classification, object detection, and instance segmentation, with various transformer architectures, demonstrating the effectiveness of our approach.

关键词： Post-training quantization ViT

来源：评论

学校读者我要写书评

暂无评论

SonicvisionLM: Playing Sound with vision Language Models

SonicVisionLM: Playing Sound with Vision Language Models

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xie, Zhifeng Yu, Shengye He, Qile Li, Mengtian Shanghai Univ Shanghai Peoples R China Shanghai Engn Res Ctr Mot Picture Special Effects Shanghai Peoples R China

ISBN: (纸本)9798350353006

There has been a growing interest in the task of generating sound for silent videos, primarily because of its practicality in streamlining video post-production. However, existing methods for video-sound generation attempt to directly create sound from visual representations, which can be challenging due to the difficulty of aligning visual representations with audio representations. In this paper, we present SonicvisionLM, a novel framework aimed at generating a wide range of sound effects by leveraging vision-language models(VLMs). Instead of generating audio directly from video, we use the capabilities of powerful VLMs. When provided with a silent video, our approach first identifies events within the video using a VLM to suggest possible sounds that match the video content. This shift in approach transforms the challenging task of aligning image and audio into more well-studied sub-problems of aligning image-to-text and text-to-audio through the popular diffusion models. To improve the quality of audio recommendations with LLMs, we have collected an extensive dataset that maps text descriptions to specific sound effects and developed a time-controlled audio adapter. Our approach surpasses current state-of-the-art methods for converting video to audio, enhancing synchronization with the visuals, and improving alignment between audio and video components. Project page: https://***. io/***/

关键词： audio generation multi-modal vision-language model

来源：评论

学校读者我要写书评

暂无评论

Learning to Predict Activity Progress by Self-Supervised Video Alignment

Learning to Predict Activity Progress by Self-Supervised Vid...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Donahue, Gerard Elhamifar, Ehsan Northwestern Univ Boston MA 02115 USA

ISBN: (纸本)9798350353006

In this paper, we tackle the problem of self-supervised video alignment and activity progress prediction using in-the-wild videos. Our proposed self-supervised representation learning method carefully addresses different action orderings, redundant actions, and background frames to generate improved video representations compared to previous methods. Our model generalizes temporal cycleconsistency learning to allow for more flexibility in determining cycle-consistent neighbors. More specifically, to handle repeated actions, we propose a multi-neighbor cycle consistency and a multi-cycle-back regression loss by finding multiple soft nearest neighbors using a Gaussian Mixture Model. To handle background and redundant frames, we introduce a context-dependent drop function in our framework, discouraging the alignment of droppable frames. On the other hand, to learn from videos of multiple activities jointly, we propose a multi-head crosstask network, allowing us to embed a video and estimate progress without knowing its activity label. Experiments on multiple datasets show that our method outperforms the state-of-the-art for video alignment and progress prediction. (1)

关键词： computer vision in the wild procedural learning progress progress prediction representation learning self-supervised self-supervised representation learning unconstrained unconstrained videos Video Alignment video understanding

来源：评论

学校读者我要写书评

暂无评论

Masked AutoDecoder is Effective Multi-Task vision Generalist

Masked AutoDecoder is Effective Multi-Task Vision Generalist

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Qiu, Han Huang, Jiaxing Gao, Peng Lu, Lewei Zhang, Xiaoqin Lu, Shijian Nanyang Technol Univ S Lab Singapore Singapore Shanghai Artificial Intelligence Lab Shanghai Peoples R China Sensetime Res Beijing Peoples R China Zhejiang Univ Technol Coll Comp Sci & Technol Hangzhou Peoples R China

ISBN: (纸本)9798350353006

Inspired by the success of general-purpose models in NLP, recent studies attempt to unify different vision tasks in the same sequence format and employ autoregressive Transformers for sequence prediction. They apply uni-directional attention to capture sequential dependencies and generate task sequences recursively. However, such autoregressive Transformers may not fit vision tasks well, as vision task sequences usually lack the sequential dependencies typically observed in natural languages. In this work, we design Masked AutoDecoder (MAD), an effective multi-task vision generalist. MAD consists of two core designs. First, we develop a parallel decoding framework that introduces bi-directional attention to capture contextual dependencies comprehensively and decode vision task sequences in parallel. Second, we design a masked sequence modeling approach that learns rich task contexts by masking and reconstructing task sequences. In this way, MAD handles all the tasks by a single network branch and a simple cross-entropy loss with minimal task-specific designs. Extensive experiments demonstrate the great potential of MAD as a new paradigm for unifying various vision tasks. MAD achieves superior performance and inference efficiency compared to autoregressive counterparts while obtaining competitive accuracy with task-specific models. Code will be released at https://***/hanqiu-hq/MAD.

关键词： Masked AutoDecoder Masked Sequence Modeling Transformer vision Generalist

来源：评论

学校读者我要写书评

暂无评论

Language-aware Visual Semantic Distillation for Video Question Answering

Language-aware Visual Semantic Distillation for Video Questi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zou, Bo Yang, Chao Qiao, Yu Quan, Chengbin Zhao, Youjian Tsinghua Univ Beijing Peoples R China Shanghai AI Lab Shanghai Peoples R China Zhongguancun Lab Beijing Peoples R China

ISBN: (纸本)9798350353006

Significant progress in video question answering (VideoQA) have been made thanks to thriving large image-language pretraining frameworks. Although image-language models can efficiently represent both video and language branches, they typically employ goal-free vision perception and do not interact vision with language well during the answer generation, thus omitting crucial visual cues. In this paper, we are inspired by the human recognition and learning pattern and propose VideoDistill, a framework with language-aware (i.e., goal-driven) behavior in both vision perception and answer generation. VideoDistill generates answers only from question-related visual embeddings and follows a thinking-observing-answering approach that closely resembles human behavior, distinguishing it from previous research. Specifically, we develop a language-aware gating mechanism to replace the standard cross-attention, avoiding language's direct fusion into visual representations. We incorporate this mechanism into two key components of the entire framework. The first component is a differentiable sparse sampling module, which selects frames containing the necessary dynamics and semantics relevant to the questions. The second component is a vision refinement module that merges existing spatial-temporal attention layers to ensure extracting multi-grained visual semantics associated with the questions. We conduct evaluations on various challenging video question-answering benchmarks, and VideoDistill achieves state-of-the-art performance in both general and long-form VideoQA datasets. In Addition, we verify that VideoDistill can effectively alleviate the utilization of language shortcut solutions in the EgoTaskQA dataset.

关键词： Multi-modal Fusion Question-Answering Video Understanding VideoQA vision-Language

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 24 25 26 27 28 29 30 31 32 33 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：