检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,798 篇 会议
88 篇 期刊文献
65 册 图书

馆藏范围

20,950 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,275 篇 工学
- 10,923 篇 计算机科学与技术...
- 2,484 篇 机械工程
- 2,307 篇 软件工程
- 913 篇 光学工程
- 771 篇 电气工程
- 556 篇 控制科学与工程
- 405 篇 信息与通信工程
- 210 篇 测绘科学与技术
- 131 篇 生物医学工程（可授...
- 104 篇 电子科学与技术（可...
- 100 篇 生物工程
- 92 篇 仪器科学与技术
- 56 篇 化学工程与技术
- 52 篇 建筑学
- 48 篇 土木工程
- 44 篇 安全科学与工程
- 38 篇 力学（可授工学、理...
- 38 篇 航空宇航科学与技...
- 35 篇 交通运输工程
3,457 篇 医学
- 3,449 篇 临床医学
- 34 篇 基础医学(可授医学...
2,315 篇 理学
- 1,154 篇 数学
- 1,132 篇 物理学
- 417 篇 统计学（可授理学、...
- 386 篇 生物学
- 252 篇 系统科学
- 57 篇 化学
353 篇 管理学
- 184 篇 图书情报与档案管...
- 176 篇 管理科学与工程(可...
- 32 篇 工商管理
28 篇 法学
20 篇 农学
15 篇 教育学
9 篇 经济学
8 篇 艺术学
5 篇 文学
5 篇 军事学

主题

8,203 篇 computer vision
3,010 篇 pattern recognit...
2,732 篇 training
1,769 篇 computational mo...
1,657 篇 visualization
1,483 篇 cameras
1,415 篇 shape
1,369 篇 three-dimensiona...
1,369 篇 face recognition
1,285 篇 image segmentati...
1,272 篇 feature extracti...
1,178 篇 robustness
1,090 篇 semantics
1,040 篇 layout
1,007 篇 object detection
975 篇 object recogniti...
969 篇 computer science
946 篇 computer archite...
946 篇 benchmark testin...
931 篇 codes

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
148 篇 univ chinese aca...
144 篇 chinese univ hon...
113 篇 microsoft resear...
103 篇 zhejiang univ pe...
99 篇 swiss fed inst t...
97 篇 tsinghua univ pe...
93 篇 tsinghua univers...
91 篇 microsoft res as...
88 篇 shanghai ai lab ...
81 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
69 篇 university of ch...
68 篇 shanghai jiao to...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

80 篇 van gool luc
71 篇 zhang lei
59 篇 timofte radu
48 篇 yang yi
47 篇 xiaoou tang
44 篇 darrell trevor
43 篇 tian qi
43 篇 luc van gool
42 篇 loy chen change
42 篇 sun jian
42 篇 li fei-fei
40 篇 qi tian
39 篇 li stan z.
37 篇 liu yang
37 篇 chen xilin
36 篇 shan shiguang
35 篇 liu xiaoming
35 篇 vasconcelos nuno
35 篇 torralba antonio
32 篇 zhou jie

语言

20,928 篇 英文
14 篇 中文
6 篇 其他
2 篇 日文
2 篇 土耳其文

检索条件"任意字段=2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009"

共 20951 条记录，以下是71-80 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization

X-MIC: Cross-Modal Instance Conditioning for Egocentric Acti...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Kukleva, Anna Sener, Fadime Remelli, Edoardo Tekin, Bugra Sauser, Eric Schiele, Bernt Mal, Shugao Meta Real Labs Menlo Pk CA 94025 USA Max Planck Inst Informat Saarland Informat Campus Saarbrucken Germany

ISBN: (纸本)9798350353006

Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition. However, the adaptation of these models to egocentric videos has been largely unexplored. To address this gap, we propose a simple yet effective cross-modal adaptation framework, which we call X-MIC. Using a video adapter, our pipeline learns to align frozen text embeddings to each egocentric video directly in the shared embedding space. Our novel adapter architecture retains and improves generalization of the pre-trained VLMs by disentangling learnable temporal modeling and frozen visual encoder. This results in an enhanced alignment of text embeddings to each egocentric video, leading to a significant improvement in cross-dataset generalization. We evaluate our approach on the Epic-Kitchens, Ego4D, and EGTEA datasets for fine-grained cross-dataset action generalization, demonstrating the effectiveness of our method.(1)

关键词： action recognition adapters egocentric generalization prompts VLMs adaptation zero-shot

来源：评论

学校读者我要写书评

暂无评论

Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning

Adaptive Hyper-graph Aggregation for Modality-Agnostic Feder...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Qi, Fan Li, Shuai Tianjin Univ Technol Tianjin Peoples R China

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353006

In Federated Learning (FL), the issue of statistical data heterogeneity has been a significant challenge to the field's ongoing development. This problem is further exacerbated when clients' data vary in modalities. In response to these issues of statistical heterogeneity and modality incompatibility, we propose the Adaptive Hyper-graph Aggregation framework, a novel solution for Modality-Agnostic Federated Learning. We design a Modular Architecture for Local Model with single modality, setting the stage for efficient intra-modality sharing and inter-modality complementarity. An innovative Global Consensus Prototype Enhancer is crafted to assimilate and broadcast global consensus knowledge within the network. At the core of our approach lies the Adaptive Hyper-graph Learning Strategy, which effectively tackles the inherent challenges of modality incompatibility and statistical heterogeneity within federated learning environments, accomplishing this adaptively even without the server being aware of the clients' modalities. Our approach, tested on three multimodal benchmark datasets, demonstrated strong performance across diverse data distributions, affirming its effectiveness in multimodal federated learning.

关键词： Knowledge engineering computer vision Adaptation models Federated learning Prototypes computer architecture Benchmark testing

来源：评论

学校读者我要写书评

暂无评论

Cinematic Behavior Transfer via NeRF-based Differentiable Filming

Cinematic Behavior Transfer via NeRF-based Differentiable Fi...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Jiang, Xuekun Rao, Anyi Wang, Jingbo Lin, Dahua Dai, Bo Shanghai AI Lab Shanghai Peoples R China Stanford Univ Stanford CA 94305 USA Chinese Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350353013;9798350353006

In the evolving landscape of digital media and video production, the precise manipulation and reproduction of visual elements like camera movements and character actions are highly desired. Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections, neglecting 3D statuses. To address these issues, we first introduce a reverse filming behavior estimation technique. It optimizes camera trajectories by leveraging NeRF as a differentiable renderer and refining SMPL tracks. We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment. The incorporation of 3D engine workflow enables superior rendering and control abilities, which also achieves a higher rating in the user study.

关键词： Three dimensional computer graphics

来源：评论

学校读者我要写书评

暂无评论

Streaming Dense Video Captioning

Streaming Dense Video Captioning

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhou, Xingyi Arnab, Anurag Buch, Shyamal Yan, Shen Myers, Austin Xiong, Xuehan Nagrani, Arsha Schmid, Cordelia Google Mountain View CA 94043 USA

ISBN: (纸本)9798350353006

An ideal model for dense video captioning - predicting captions localized temporally in a video - should be able to handle long input videos, predict rich, detailed textual descriptions, and be able to produce outputs before processing the entire video. Current state-of-the-art models, however, process a fixed number of downsampled frames, and make a single full prediction after seeing the whole video. We propose a streaming dense video captioning model that consists of two novel components: First, we propose a new memory module, based on clustering incoming tokens, which can handle arbitrarily long videos as the memory is of a fixed size. Second, we develop a streaming decoding algorithm that enables our model to make predictions before the entire video has been processed. Our model achieves this streaming ability, and significantly improves the state-of-the-art on three dense video captioning benchmarks: ActivityNet, YouCook2 and ViTT. Our code is released at https://***/google-research/scenic.

关键词： captioning long video streaming video video captioning vision and language

来源：评论

学校读者我要写书评

暂无评论

JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups

JRDB-Social: A Multifaceted Robotic Dataset for Understandin...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Jahangard, Simindokht Cai, Zhixi Wen, Shiki Rezatofighi, Hamid Monash Univ Clayton Vic Australia

ISBN: (纸本)9798350353006

Understanding human social behaviour is crucial in computer vision and robotics. Micro-level observations like individual actions fall short, necessitating a comprehensive approach that considers individual behaviour, intra-group dynamics, and social group levels for a thorough understanding. To address dataset limitations, this paper introduces JRDB-Social, an extension of JRDB [2]. Designed to fill gaps in human understanding across diverse indoor and outdoor social contexts, JRDB-Social provides annotations at three levels: individual attributes, intra-group interactions, and social group context. This dataset aims to enhance our grasp of human social dynamics for robotic applications. Utilizing the recent cutting-edge multi-modal large language models, we evaluated our benchmark to explore their capacity to decipher social human behaviour.

关键词： dataset human attributes human human interaction human social behaviour understanding interaction large language model multifaceted robotic dataset social group social robot vision language model visual question answering visual reasoning

来源：评论

学校读者我要写书评

暂无评论

Semantics-aware Motion Retargeting with vision-Language Models

Semantics-aware Motion Retargeting with Vision-Language Mode...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhang, Haodong Chen, Zhike Xu, Haocheng Hao, Lei Wu, Xiaofei Xu, Songcen Zhang, Zhensong Wang, Yue Xiong, Rong Zhejiang Univ Hangzhou Peoples R China Huawei Noahs Ark Lab Montreal PQ Canada

ISBN: (纸本)9798350353013;9798350353006

Capturing and preserving motion semantics is essential to motion retargeting between animation characters. However, most of the previous works neglect the semantic information or rely on human-designed joint-level representations. Here, we present a novel Semantics-aware Motion reTargeting (SMT) method with the advantage of vision-language models to extract and maintain meaningful motion semantics. We utilize a differentiable module to render 3D motions. Then the high-level motion semantics are incorporated into the motion retargeting process by feeding the vision-language model with the rendered images and aligning the extracted semantic embeddings. To ensure the preservation of fine-grained motion details and high-level semantics, we adopt a two-stage pipeline consisting of skeleton-aware pre-training and fine-tuning with semantics and geometry constraints. Experimental results show the effectiveness of the proposed method in producing high-quality motion retargeting results while accurately preserving motion semantics. Project page can be found at https://***/view/smtnet.

关键词： Animation Motion Retargeting vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Repurposing Diffusion-Based Image Generators for Monocular D...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ke, Bingxin Obukhov, Anton Huang, Shengyu Metzger, Nando Daudt, Rodrigo Caye Schindler, Konrad Swiss Fed Inst Technol Photogrammetry & Remote Sensing Zurich Switzerland

ISBN: (纸本)9798350353006

Monocular depth estimation is a fundamental computer vision task. Recovering 3D depth from a single image is geometrically ill-posed and requires scene understanding, so it is not surprising that the rise of deep learning has led to a breakthrough. The impressive progress of monocular depth estimators has mirrored the growth in model capacity, from relatively modest CNNs to large Transformer architectures. Still, monocular depth estimators tend to struggle when presented with images with unfamiliar content and layout, since their knowledge of the visual world is restricted by the data seen during training, and challenged by zero-shot generalization to new domains. This motivates us to explore whether the extensive priors captured in recent generative diffusion models can enable better, more generalizable depth estimation. We introduce Marigold, a method for affine-invariant monocular depth estimation that is derived from Stable Diffusion and retains its rich prior knowledge. The estimator can be fine-tuned in a couple of days on a single GPU using only synthetic training data. It delivers state-of-the-art performance across a wide range of datasets, including over 20% performance gains in specific cases. Project page: https://***.

关键词： ddim ddpm depth estimation diffusion generative LDM vision

来源：评论

学校读者我要写书评

暂无评论

SnAG: Scalable and Accurate Video Grounding

SnAG: Scalable and Accurate Video Grounding

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Mu, Fangzhou Mo, Sicheng Li, Yin Univ Wisconsin Madison Madison WI 53706 USA

ISBN: (纸本)9798350353006

Temporal grounding of text descriptions in videos is a central problem in vision-language learning and video understanding. Existing methods often prioritize accuracy over scalability - they have been optimized for grounding only a few text queries within short videos, and fail to scale up to long videos with hundreds of queries. In this paper, we study the effect of cross-modal fusion on the scalability of video grounding models. Our analysis establishes late fusion as a more cost-effective fusion scheme for long-form videos with many text queries. Moreover, it leads us to a novel, video-centric sampling scheme for efficient training. Based on these findings, we present SnAG, a simple baseline for scalable and accurate video grounding. Without bells and whistles, SnAG is 43% more accurate and 1.5x faster than CONE, a state of the art for long-form video grounding on the challenging MAD dataset, while achieving highly competitive results on short videos. Our code is available at https://***/fmu2/snag_release.

关键词： Temporal Sentence Grounding Video understanding vision-Language Learning

来源：评论

学校读者我要写书评

暂无评论

LTGC: Long-tail recognition via Leveraging LLMs-driven Generated Content

LTGC: Long-tail Recognition via Leveraging LLMs-driven Gener...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Zhao, Qihao Dai, Yalun Li, Hao Hu, Wei Zhang, Fan Liu, Jun Beijing Univ Chem Technol Beijing Peoples R China Singapore Univ Technol & Design Singapore Singapore Nanyang Technol Univ Singapore Singapore Northwestern Polytech Univ Xian Peoples R China

ISBN: (纸本)9798350353006

Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories. In this paper, we propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content. Firstly, inspired by the rich implicit knowledge in large-scale models (e.g., large language models, LLMs), LTGC leverages the power of these models to parse and reason over the original tail data to produce diverse tail-class content. We then propose several novel designs for LTGC to ensure the quality of the generated data and to efficiently fine-tune the model using both the generated and original data. The visualization demonstrates the effectiveness of the generation module in LTGC, which produces accurate and diverse tail data. Additionally, the experimental results demonstrate that our LTGC outperforms existing state-of-the-art methods on popular long-tailed benchmarks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Exploring vision Transformers for 3D Human Motion-Language Models with Motion Patches

Exploring Vision Transformers for 3D Human Motion-Language M...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yu, Qing Tanaka, Mikihiro Fujiwara, Kent LY Corp Tokyo Japan

ISBN: (纸本)9798350353013;9798350353006

To build a cross-modal latent space between 3D human motion and language, acquiring large-scale and high-quality human motion data is crucial. However, unlike the abundance of image data, the scarcity of motion data has limited the performance of existing motion-language models. To counter this, we introduce "motion patches", a new representation of motion sequences, and propose using vision Transformers (ViT) as motion encoders via transfer learning, aiming to extract useful knowledge from the image domain and apply it to the motion domain. These motion patches, created by dividing and sorting skeleton joints based on body parts in motion sequences, are robust to varying skeleton structures, and can be regarded as color image patches in ViT. We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis, presenting a promising direction for addressing the issue of limited motion data. Our extensive experiments show that the proposed motion patches, used jointly with ViT, achieve state-of-the-art performance in the benchmarks of text-to-motion retrieval, and other novel challenging tasks, such as cross-skeleton recognition, zero-shot motion classification, and human interaction recognition, which are currently impeded by the lack of data.

关键词： Motion Representation Motion-Language Models Text-Motion Retrieval

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 4 5 6 7 8 9 10 11 12 13 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：