检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

20,994 篇 会议
99 册 图书
85 篇 期刊文献
1 篇 学位论文

馆藏范围

21,178 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,603 篇 工学
- 11,179 篇 计算机科学与技术...
- 2,631 篇 机械工程
- 2,542 篇 软件工程
- 990 篇 光学工程
- 849 篇 电气工程
- 676 篇 控制科学与工程
- 487 篇 信息与通信工程
- 242 篇 仪器科学与技术
- 215 篇 测绘科学与技术
- 159 篇 生物医学工程（可授...
- 150 篇 生物工程
- 139 篇 电子科学与技术（可...
- 69 篇 安全科学与工程
- 67 篇 化学工程与技术
- 55 篇 建筑学
- 53 篇 土木工程
- 43 篇 力学（可授工学、理...
- 41 篇 航空宇航科学与技...
3,462 篇 医学
- 3,452 篇 临床医学
- 41 篇 基础医学(可授医学...
2,483 篇 理学
- 1,247 篇 数学
- 1,213 篇 物理学
- 446 篇 统计学（可授理学、...
- 418 篇 生物学
- 269 篇 系统科学
- 67 篇 化学
424 篇 管理学
- 218 篇 管理科学与工程(可...
- 217 篇 图书情报与档案管...
- 43 篇 工商管理
144 篇 艺术学
- 142 篇 设计学（可授艺术学...
41 篇 法学
31 篇 农学
12 篇 经济学
10 篇 教育学
6 篇 文学
3 篇 军事学

主题

8,072 篇 computer vision
2,879 篇 pattern recognit...
2,859 篇 training
1,808 篇 computational mo...
1,718 篇 visualization
1,478 篇 cameras
1,381 篇 shape
1,374 篇 face recognition
1,364 篇 three-dimensiona...
1,342 篇 feature extracti...
1,269 篇 image segmentati...
1,156 篇 robustness
1,109 篇 semantics
982 篇 layout
978 篇 object detection
953 篇 computer archite...
952 篇 benchmark testin...
931 篇 codes
918 篇 object recogniti...
899 篇 computer science

机构

174 篇 univ sci & techn...
154 篇 carnegie mellon ...
149 篇 univ chinese aca...
144 篇 chinese univ hon...
110 篇 microsoft resear...
104 篇 zhejiang univ pe...
98 篇 swiss fed inst t...
93 篇 tsinghua univ pe...
92 篇 tsinghua univers...
90 篇 microsoft res as...
88 篇 shanghai ai lab ...
83 篇 zhejiang univers...
76 篇 alibaba grp peop...
74 篇 hong kong univ s...
73 篇 university of sc...
72 篇 peking univ peop...
68 篇 shanghai jiao to...
68 篇 university of ch...
66 篇 google res mount...
66 篇 univ oxford oxfo...

作者

83 篇 van gool luc
71 篇 zhang lei
60 篇 timofte radu
49 篇 yang yi
49 篇 luc van gool
48 篇 xiaoou tang
43 篇 darrell trevor
43 篇 tian qi
42 篇 loy chen change
42 篇 sun jian
41 篇 qi tian
37 篇 vasconcelos nuno
37 篇 liu yang
37 篇 chen xilin
37 篇 li fei-fei
36 篇 liu xiaoming
36 篇 shan shiguang
36 篇 li stan z.
36 篇 torralba antonio
33 篇 zhou jie

语言

21,137 篇 英文
31 篇 中文
5 篇 土耳其文
4 篇 其他
2 篇 日文

检索条件"任意字段=2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011"

共 21179 条记录，以下是221-230 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Gradient Reweighting: Towards Imbalanced Class-Incremental Learning

Gradient Reweighting: Towards Imbalanced Class-Incremental L...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： He, Jiangpeng Purdue Univ Elmore Family Sch Elect & Comp Engn W Lafayette IN 47907 USA

ISBN: (纸本)9798350353006

Class-Incremental Learning (CIL) trains a model to continually recognize new classes from non-stationary data while retaining learned knowledge. A major challenge of CIL arises when applying to real-world data characterized by non-uniform distribution, which introduces a dual imbalance problem involving (i) disparities between stored exemplars of old tasks and new class data (inter-phase imbalance), and (ii) severe class imbalances within each individual task (intra-phase imbalance). We show that this dual imbalance issue causes skewed gradient updates with biased weights in FC layers, thus inducing over/under-fitting and catastrophic forgetting in CIL. Our method addresses it by reweighting the gradients towards balanced optimization and unbiased classifier learning. Additionally, we observe imbalanced forgetting where paradoxically the instance-rich classes suffer higher performance degradation during CIL due to a larger amount of training data becoming unavailable in subsequent learning phases. To tackle this, we further introduce a distribution-aware knowledge distillation loss to mitigate forgetting by aligning output logits proportionally with the distribution of lost training data. We validate our method on CIFAR-100, ImageNetSubset, and Food101 across various evaluation protocols and demonstrate consistent improvements compared to existing works, showing great potential to apply CIL in real-world scenarios with enhanced robustness and effectiveness.

关键词： computer vision continual learniing imbalanced gradient long-tailed learning

来源：评论

学校读者我要写书评

暂无评论

Multi-Modal Hallucination Control by Visual Information Grounding

Multi-Modal Hallucination Control by Visual Information Grou...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Favero, Alessandro Zancato, Luca Trager, Matthew Choudhary, Siddharth Perera, Pramuditha Achille, Alessandro Swaminathan, Ashwin Soatto, Stefano AWS AI Labs Lausanne Switzerland

ISBN: (纸本)9798350353006

Generative vision-Language Models (VLMs) are prone to generate plausible-sounding textual answers that, however, are not always grounded in the input image. We investigate this phenomenon, usually referred to as "hallucination" and show that it stems from an excessive reliance on the language prior. In particular, we show that as more tokens are generated, the reliance on the visual prompt decreases, and this behavior strongly correlates with the emergence of hallucinations. To reduce hallucinations, we introduce Multi-Modal Mutual-Information Decoding (M3ID), a new sampling method for prompt amplification. M3ID amplifies the influence of the reference image over the language prior, hence favoring the generation of tokens with higher mutual information with the visual prompt. M3ID can be applied to any pre-trained autoregressive VLM at inference time without necessitating further training and with minimal computational overhead. If training is an option, we show that M3ID can be paired with Direct Preference Optimization ( DPO) to improve the model's reliance on the prompt image without requiring any labels. Our empirical findings show that our algorithms maintain the fluency and linguistic capabilities of pre-trained VLMs while reducing hallucinations by mitigating visually ungrounded answers. Specifically, for the LLaVA 13B model, M3ID and M3ID+DPO reduce the percentage of hallucinated objects in captioning tasks by 25% and 28%, respectively, and improve the accuracy on VQA benchmarks such as POPE by 21% and 24%.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics

Hyper-MD: Mesh Denoising with Customized Parameters Aware of...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wang, Xingtao Wei, Hongliang Fan, Xiaopeng Zhao, Debin Harbin Inst Technol Harbin Peoples R China

ISBN: (纸本)9798350353013;9798350353006

Mesh denoising (MD) is a critical task in geometry processing, as meshes from scanning or AIGC techniques are susceptible to noise contamination. The challenge of MD lies in the diverse nature of mesh facets in terms of geometric characteristics and noise distributions. Despite recent advancements in deep learning-based MD methods, existing MD networks typically neglect the consideration of geometric characteristics and noise distributions. In this paper, we propose Hyper-MD, a hyper-network-based approach that addresses this limitation by dynamically customizing denoising parameters for each facet based on its noise intensity and geometric characteristics. Specifically, HyperMD is composed of a hyper-network and an MD network. For each noisy facet, the hyper-network takes two angles as input to customize parameters for the MD network. These two angles are specially defined to reveal the noise intensity and geometric characteristics of the current facet, respectively. The MD network receives a facet patch as input, and outputs the denoised normal using the customized parameters. Experimental results on synthetic and real-scanned meshes demonstrate that Hyper-MD outperforms state-of-the-art mesh denoising methods.

关键词： computer graphics hyper-network Mesh denoising

来源：评论

学校读者我要写书评

暂无评论

Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features

Back to 3D: Few-Shot 3D Keypoint Detection with Back-Project...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Wimmer, Thomas Wonka, Peter Ovsjanikov, Maks Ecole Polytech LIX Palaiseau France Tech Univ Munich Munich Germany KAUST Thuwal Saudi Arabia

ISBN: (纸本)9798350353013;9798350353006

With the immense growth of dataset sizes and computing resources in recent years, so-called foundation models have become popular in NLP and vision tasks. In this work, we propose to explore foundation models for the task of keypoint detection on 3D shapes. A unique characteristic of keypoint detection is that it requires semantic and geometric awareness while demanding high localization accuracy. To address this problem, we propose, first, to back-project features from large pre-trained 2D vision models onto 3D shapes and employ them for this task. We show that we obtain robust 3D features that contain rich semantic information and analyze multiple candidate features stemming from different 2D foundation models. Second, we employ a keypoint candidate optimization module which aims to match the average observed distribution of keypoints on the shape and is guided by the back-projected features. The resulting approach achieves a new state of the art for few-shot keypoint detection on the KeyPointNet dataset, almost doubling the performance of the previous best methods.

关键词： 3D vision Foundation Models Keypoint Detection Shape Analysis

来源：评论

学校读者我要写书评

暂无评论

Multiscale vision Transformers meet Bipartite Matching for efficient single-stage Action Localization

Multiscale Vision Transformers meet Bipartite Matching for e...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ntinou, Ioanna Sanchez, Enrique Tzimiropoulos, Georgios Queen Mary Univ London London England Samsung AI Ctr Cambridge Cambridge England

ISBN: (纸本)9798350353006

Action Localization is a challenging problem that combines detection and recognition tasks, which are often addressed separately. State-of-the-art methods rely on off-the-shelf bounding box detections pre-computed at high resolution, and propose transformer models that focus on the classification task alone. Such two-stage solutions are prohibitive for real-time deployment. On the other hand, single-stage methods target both tasks by devoting part of the network (generally the backbone) to sharing the majority of the workload, compromising performance for speed. These methods build on adding a DETR head with learnable queries that after cross- and self-attention can be sent to corresponding MLPs for detecting a person's bounding box and action. However, DETR-like architectures are challenging to train and can incur in big complexity. In this paper, we observe that a straight bipartite matching loss can be applied to the output tokens of a vision transformer. This results in a backbone + MLP architecture that can do both tasks without the need of an extra encoder-decoder head and learnable queries. We show that a single MViTv2-S architecture trained with bipartite matching to perform both tasks surpasses the same MViTv2-S when trained with RoI align on pre-computed bounding boxes. With a careful design of token pooling and the proposed training pipeline, our Bipartite-Matching vision Transformer model, BMViT, achieves +3 mAP on AVA2.2. w.r.t. the two-stage MViTv2-S counterpart. Code is available at https://***/IoannaNti/BMViT

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

Edit One for All: Interactive Batch Image Editing

Edit One for All: Interactive Batch Image Editing

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Thao Nguyen Ojha, Utkarsh Li, Yuheng Liu, Haotian Lee, Yong Jae Univ Wisconsin Madison Madison WI 53707 USA

ISBN: (纸本)9798350353013;9798350353006

In recent years, image editing has advanced remarkably. With increased human control, it is now possible to edit an image in a plethora of ways;from specifying in text what we want to change, to straight up dragging the contents of the image in an interactive point-based manner. However, most of the focus has remained on editing single images at a time. Whether and how we can simultaneously edit large batches of images has remained understudied. With the goal of minimizing human supervision in the editing process, this paper presents a novel method for interactive batch image editing using StyleGAN as the medium. Given an edit specified by users in an example image (e.g., make the face frontal), our method can automatically transfer that edit to other test images, so that regardless of their initial state (pose), they all arrive at the same final state (e.g., all facing front). Extensive experiments demonstrate that edits performed using our method have similar visual quality to existing single-image-editing methods, while having more visual consistency and saving significant time and human effort.

关键词： batch image editing computer vision image editing image generation image manipulation

来源：评论

学校读者我要写书评

暂无评论

OVMR: Open-Vocabulary recognition with Multi-Modal References

OVMR: Open-Vocabulary Recognition with Multi-Modal Reference...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Ma, Zehong Zhang, Shiliang Wei, Longhui Tian, Qi Peking Univ Sch Comp Sci Natl Key Lab Multimedia Informat Proc Beijing Peoples R China Huawei Inc Shenzhen Guangdong Peoples R China

ISBN: (纸本)9798350353006

The challenge of open-vocabulary recognition lies in the model has no clue of new categories it is applied to. Existing works have proposed different methods to embed category cues into the model, e.g., through few-shot fine-tuning, providing category names or textual descriptions to vision-Language Models. Fine-tuning is time-consuming and degrades the generalization capability. Textual descriptions could be ambiguous and fail to depict visual details. This paper tackles open-vocabulary recognition from a different perspective by referring to multi-modal clues composed of textual descriptions and exemplar images. Our method, named OVMR, adopts two innovative components to pursue a more robust category cues embedding. A multi-modal classifier is first generated by dynamically complementing textual descriptions with image exemplars. A preference-based refinement module is hence applied to fuse uni-modal and multi-modal classifiers, with the aim to alleviate issues of low-quality exemplar images or textual descriptions. The proposed OVMR is a plug-and-play module, and works well with exemplar images randomly crawled from the Internet. Extensive experiments have demonstrated the promising performance of OVMR, e.g., it outperforms existing methods across various scenarios and setups. Codes are publicly available at https://***/Zehong-Ma/OVMR.

关键词： Open-Vocabulary Detection Open-Vocabulary recognition

来源：评论

学校读者我要写书评

暂无评论

Unified Language-driven Zero-shot Domain Adaptation

Unified Language-driven Zero-shot Domain Adaptation

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Yang, Senqiao Tian, Zhuotao Jiang, Li Jia, Jiaya Chinese Univ Hong Kong Hong Kong Peoples R China Harbin Inst Technol Shenzhen Peoples R China Chinese Univ Hong Kong Shenzhen Peoples R China

ISBN: (纸本)9798350353006

This paper introduces Unified Language-driven Zero-shot Domain Adaptation ( ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing language-driven zero-shot domain adaptation task, particularly the requirement for domain IDs and domain-specific models, which may restrict flexibility and scalability. To overcome these issues, we propose a new framework for ULDA, consisting of Hierarchical Context Alignment (HCA), Domain Consistent Representation Learning (DCRL), and Text-Driven Rectifier (TDR). These components work synergistically to align simulated features with target text across multiple visual levels, retain semantic correlations between different regional representations, and rectify biases between simulated and real target visual features, respectively. Our extensive empirical evaluations demonstrate that this framework achieves competitive performance in both settings, surpassing even the model that requires domain-ID, showcasing its superiority and generalization ability. The proposed method is not only effective but also maintains practicality and efficiency, as it does not introduce additional computational costs during inference. The code is available on the project website(1).

关键词： Domain Adaptation Segmentation vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning

Troika: Multi-Path Cross-Modal Traction for Compositional Ze...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Hu, Siteng Gong, Biao Feng, Yutong Zhang, Min Lv, Yiliang Wang, Donglin Zhejiang Univ Hangzhou Peoples R China Alibaba Grp Hangzhou Peoples R China Westlake Univ Sch Engn AI Div Machine Intelligence Lab MiLAB Hangzhou Peoples R China

ISBN: (纸本)9798350353006

Recent compositional zero-shot learning (CZSL) methods adapt pre-trained vision-language models (VLMs) by constructing trainable prompts only for composed state-object pairs. Relying on learning the joint representation of seen compositions, these methods ignore the explicit modeling of the state and object, thus limiting the exploitation of pre-trained knowledge and generalization to unseen compositions. With a particular focus on the universality of the solution, in this work, we propose a novel paradigm for CZSL models that establishes three identification branches (i.e., Multi-Path) to jointly model the state, object, and composition. The presented Troika is an outstanding implementation that aligns the branch-specific prompt representations with decomposed visual features. To calibrate the bias between semantically similar multi-modal representations, we further devise a Cross-Modal Traction module into Troika that shifts the prompt representation towards the current visual content. We conduct extensive experiments on three popular benchmarks, where our method significantly outperforms existing methods in both closed-world and open-world settings. The code will be available at https://***/bighuang624/Troika.

关键词： compositional zero-shot learning parameter-efficient transfer learning Troika vision-language models

来源：评论

学校读者我要写书评

暂无评论

SpatialVLM: Endowing vision-Language Models with Spatial Reasoning Capabilities

SpatialVLM: Endowing Vision-Language Models with Spatial Rea...

引用

ieee/CVF conference on computer vision and pattern recognition (cvpr)

作者： Chen, Boyuan Xu, Zhuo Kirman, Sean Ichter, Brian Sadigh, Dorsa Guibas, Leonidas Xia, Fei Google DeepMind London England Google Res Mountain View CA USA MIT 77 Massachusetts Ave Cambridge MA 02139 USA

ISBN: (纸本)9798350353006

Understanding and reasoning about spatial relationships is a fundamental capability for Visual Question Answering (VQA) and robotics. While vision Language Models (VLM) have demonstrated remarkable performance in certain VQA benchmarks, they still lack capabilities in 3D spatial reasoning, such as recognizing quantitative relationships of physical objects like distances or size difference. We hypothesize that VLMs' limited spatial reasoning capability is due to the lack of 3D spatial knowledge in training data and aim to solve this problem by training VLMs with Internet-scale spatial reasoning data. To this end, we present a system to facilitate this approach. We first develop an automatic 3D spatial VQA data generation framework that scales up to 2 billion VQA examples on 10 million real-world images. We then investigate various factors in training recipe including data quality, training pipeline and VLM architecture. Our work features the first Internet-scale 3D spatial reasoning dataset in metric space. By training a VLM on such data, we significantly enhance its ability on both qualitative and quantitative spatial VQA. Finally, we demonstrate that this VLM unlocks novel downstream applications in chain-of-thought spatial reasoning and robotics due to its quantitative estimation capability. Website: https://***/

关键词： large language model multimodal spatial reasoning vision language model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 19 20 21 22 23 24 25 26 27 28 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：