检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

23,001 篇 会议
126 册 图书
92 篇 期刊文献

馆藏范围

23,218 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,623 篇 工学
- 11,108 篇 计算机科学与技术...
- 3,479 篇 软件工程
- 2,445 篇 机械工程
- 1,716 篇 光学工程
- 1,075 篇 电气工程
- 1,014 篇 控制科学与工程
- 785 篇 信息与通信工程
- 412 篇 仪器科学与技术
- 352 篇 生物工程
- 251 篇 生物医学工程（可授...
- 196 篇 电子科学与技术（可...
- 114 篇 化学工程与技术
- 108 篇 安全科学与工程
- 100 篇 测绘科学与技术
- 88 篇 建筑学
- 87 篇 交通运输工程
- 84 篇 土木工程
3,494 篇 医学
- 3,481 篇 临床医学
- 81 篇 基础医学(可授医学...
3,242 篇 理学
- 1,939 篇 物理学
- 1,640 篇 数学
- 563 篇 统计学（可授理学、...
- 500 篇 生物学
- 249 篇 系统科学
- 107 篇 化学
522 篇 管理学
- 311 篇 图书情报与档案管...
- 224 篇 管理科学与工程(可...
- 76 篇 工商管理
276 篇 艺术学
- 276 篇 设计学（可授艺术学...
66 篇 法学
- 63 篇 社会学
38 篇 农学
28 篇 教育学
22 篇 经济学
10 篇 军事学
3 篇 文学

主题

10,187 篇 computer vision
3,967 篇 pattern recognit...
3,005 篇 training
2,007 篇 computational mo...
1,818 篇 visualization
1,815 篇 cameras
1,516 篇 feature extracti...
1,481 篇 shape
1,455 篇 three-dimensiona...
1,438 篇 image segmentati...
1,287 篇 robustness
1,205 篇 computer archite...
1,155 篇 semantics
1,147 篇 conferences
1,107 篇 layout
1,092 篇 computer science
1,087 篇 object detection
1,025 篇 benchmark testin...
970 篇 codes
922 篇 face recognition

机构

136 篇 univ sci & techn...
121 篇 univ chinese aca...
118 篇 chinese univ hon...
107 篇 carnegie mellon ...
101 篇 tsinghua univers...
101 篇 microsoft resear...
95 篇 swiss fed inst t...
93 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
80 篇 university of ch...
77 篇 shanghai ai lab ...
72 篇 shanghai jiao to...
69 篇 national laborat...
67 篇 microsoft res as...
67 篇 alibaba grp peop...
64 篇 adobe research
61 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 univ oxford oxfo...

作者

81 篇 van gool luc
72 篇 timofte radu
64 篇 zhang lei
47 篇 luc van gool
40 篇 yang yi
40 篇 li stan z.
37 篇 loy chen change
34 篇 chen chen
33 篇 xiaoou tang
32 篇 liu yang
32 篇 qi tian
31 篇 tian qi
31 篇 sun jian
30 篇 murino vittorio
30 篇 pascal fua
29 篇 darrell trevor
29 篇 li fei-fei
28 篇 li xin
28 篇 ying shan
27 篇 vasconcelos nuno

语言

23,137 篇 英文
53 篇 其他
22 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=IEEE Conference on Computer Vision and Pattern Recognition Workshops"

共 23219 条记录，以下是331-340 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

RMem: Restricted Memory Banks Improve Video Object Segmentation

RMem: Restricted Memory Banks Improve Video Object Segmentat...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhou, Junbao Pang, Ziqi Wang, Yu-Xiong Univ Illinois Champaign IL 61820 USA

ISBN: (纸本)9798350353006

With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a sim-ple but overlooked strategy: restricting the size of memory banks. This diverges from the prevalent practice of ex-panding memory banks to accommodate extensive histor-ical information. Our specially designed memory deci-phering study offers a pivotal insight underpinning such a strategy: expanding memory banks, while seemingly bene-ficial, actually increases the difficulty for VOS modules to decode relevant features due to the confusion from redun-dant information. By restricting memory banks to a limited number of essential frames, we achieve a notable improvement in VOS accuracy. This process balances the im-portance and freshness of frames to maintain an informative memory bank within a bounded capacity. Additionally, restricted memory banks reduce the training-inference discrepancy in memory lengths compared with continuous expansion. This fosters new opportunities in temporal reasoning and enables us to introduce the previously overlooked temporal positional embedding. Finally, our insights are embodied in RMem (R for restricted), a simple yet effective VOS modification that excels at challenging VOS scenarios and establishes new state of the art for object state changes (on the VOST dataset) and long videos (on the Long Videos dataset). Our code and demos are available at https://***/.

关键词： egocentric vision embodied ai video object segmentation video understanding

来源：评论

学校读者我要写书评

暂无评论

MAPLM: A Real-World Large-Scale vision-Language Benchmark for Map and Traffic Scene Understanding

MAPLM: A Real-World Large-Scale Vision-Language Benchmark fo...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cao, Xu Zhou, Tong Ma, Yunsheng Ye, Wenqian Cui, Can Tang, Kun Cao, Zhipeng Liang, Kaizhao Wang, Ziran Rehg, James M. Zheng, Chao Tencent T Lab Palo Alto CA 94306 USA Univ Illinois Champaign IL USA Purdue Univ W Lafayette IN USA Univ Virginia Charlottesville VA USA SambaNova Syst Inc Palo Alto CA USA

ISBN: (纸本)9798350353006

vision-language generative AI has demonstrated remarkable promise for empowering cross-modal scene understanding of autonomous driving and high-definition (HD) map systems. However, current benchmark datasets lack multi-modal point cloud, image, and language data pairs. Recent approaches utilize visual instruction learning and cross-modal prompt engineering to expand vision-language models into this domain. In this paper, we propose a new vision-language benchmark that can be used to finetune traffic and HD map domain-specific foundation models. Specifically, we annotate and leverage large-scale, broad-coverage traffic and map data extracted from huge HD map annotations, and use CLIP and LLaMA-2 / Vicuna to finetune a baseline model with instruction-following data. Our experimental results across various algorithms reveal that while visual instruction-tuning large language models (LLMs) can effectively learn meaningful representations from MAPLM-QA, there remains significant room for further advancements. To facilitate applying LLMs and multi-modal data into self-driving research, we will release our visual-language QA data, and the baseline models at ***/LLVM-AD/MAPLM.

关键词： High-definition (HD) Map Large Language Model Multimodal Learning vision-Language Model Visual Question Answering

来源：评论

学校读者我要写书评

暂无评论

Differentiable Display Photometric Stereo

Differentiable Display Photometric Stereo

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Choi, Seokjun Yoon, Seungwoo Nam, Giljoo Lee, Seungyong Baek, Seung-Hwan POSTECH Pohang South Korea Meta Real Labs Menlo Pk CA USA

ISBN: (纸本)9798350353006

Photometric stereo leverages variations in illumination conditions to reconstruct surface normals. Display photometric stereo, which employs a conventional monitor as an illumination source, has the potential to overcome limitations often encountered in bulky and difficult-to-use conventional setups. In this paper, we present differentiable display photometric stereo (DDPS), addressing an often over-looked challenge in display photometric stereo: the design of display patterns. Departing from using heuristic display patterns, DDPS learns the display patterns that yield accurate normal reconstruction for a target system in an end-to-end manner. To this end, we propose a differentiable framework that couples basis-illumination image formation with analytic photometric-stereo reconstruction. The differentiable framework facilitates the effective learning of display patterns via auto-differentiation. Also, for training supervision, we propose to use 3D printing for creating a real-world training dataset, enabling accurate reconstruction on the target real-world setup. Finally, we exploit that conventional LCD monitors emit polarized light, which allows for the optical separation of diffuse and specular reflections when combined with a polarization camera, leading to accurate normal reconstruction. Extensive evaluation of DDPS shows improved normal-reconstruction accuracy compared to heuristic patterns and demonstrates compelling properties such as robustness to pattern initialization, calibration errors, and simplifications in image formation and reconstruction.

关键词： Computational illumination computer graphics computer vision Photometric stereo

来源：评论

学校读者我要写书评

暂无评论

GANDiffFace: Controllable Generation of Synthetic Datasets for Face recognition with Realistic Variations

GANDiffFace: Controllable Generation of Synthetic Datasets f...

引用

ieee/CVF International conference on computer vision (ICCV)

作者： Melzi, Pietro Rathgeb, Christian Tolosana, Ruben Vera-Rodriguez, Ruben Lawatsch, Dominik Domin, Florian Schaubert, Maxim Univ Autonoma Madrid Biometr & Data Pattern Analyt Lab Madrid Spain Secunet Secur Networks AG Essen Germany Hsch Darmstadt Darmstadt Germany

ISBN: (纸本)9798350307443

Face recognition systems have significantly advanced in recent years, driven by the availability of large-scale datasets. However, several issues have recently came up, including privacy concerns that have led to the discontinuation of well-established public datasets. Synthetic datasets have emerged as a solution, even though current synthesis methods present other drawbacks such as limited intraclass variations, lack of realism, and unfair representation of demographic groups. This study introduces GANDiffFace, a novel framework for the generation of synthetic datasets for face recognition that combines the power of Generative Adversarial Networks (GANs) and Diffusion models to overcome the limitations of existing synthetic datasets. In GANDiffFace, we first propose the use of GANs to synthesize highly realistic identities and meet target demographic distributions. Subsequently, we fine-tune Diffusion models with the images generated with GANs, synthesizing multiple images of the same identity with a variety of accessories, poses, expressions, and contexts. We generate multiple synthetic datasets by changing GANDiffFace settings, and compare their mated and non-mated score distributions with the distributions provided by popular real-world datasets for face recognition, i.e. VGG2 and IJB-C. Our results show the feasibility of the proposed GANDiffFace, in particular the use of Diffusion models to enhance the (limited) intra-class variations provided by GANs towards the level of real-world datasets.

关键词： diffusion face recognition generative ai stylegan synthetic

来源：评论

学校读者我要写书评

暂无评论

Absolute Pose from One or Two Scaled and Oriented Features

Absolute Pose from One or Two Scaled and Oriented Features

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ventura, Jonathan Kukelova, Zuzana Sattler, Torsten Barath, Daniel Cal Poly Dept Comp Sci & Software Engn San Luis Obispo CA USA Czech Tech Univ Visual Recognit Grp Fac Elect Engn Prague Czech Republic Czech Tech Univ Czech Inst Informat Robot & Cybernet Prague Czech Republic Swiss Fed Inst Technol Dept Comp Sci Comp Vision & Geometry Grp Zurich Switzerland

ISBN: (纸本)9798350353006

Keypoints used for image matching often include an estimate of the feature scale and orientation. While recent work has demonstrated the advantages of using feature scales and orientations for relative pose estimation, relatively little work has considered their use for absolute pose estimation. We introduce minimal solutions for absolute pose from two oriented feature correspondences in the general case, or one scaled and oriented correspondence given a known vertical direction. Nowadays, assuming a known direction is not particularly restrictive as modern consumer devices, such as smartphones or drones, are equipped with Inertial Measurement Units (IMU) that provide the gravity direction by default. Compared to traditional absolute pose methods requiring three point correspondences, our solvers need a smaller minimal sample, reducing the cost and complexity of robust estimation. Evaluations on large-scale and public real datasets demonstrate the advantage of our methods for fast and accurate localization in challenging conditions. Code is available at https: //***/danini/absolute-pose-from-orientedand-scaled-features.

关键词： absolute pose computer vision geometric vision image-based localization minimal solver

来源：评论

学校读者我要写书评

暂无评论

Discovering and Mitigating Visual Biases through Keyword Explanation

Discovering and Mitigating Visual Biases through Keyword Exp...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Younghyun Mo, Sangwoo Kim, Minkyu Lee, Kyungmin Lee, Jaeho Shin, Jinwoo Korea Adv Inst Sci & Technol Daejeon South Korea Univ Michigan Ann Arbor MI 48109 USA KRAFTON Seongnam South Korea POSTECH Pohang South Korea

ISBN: (纸本)9798350353006

Addressing biases in computer vision models is crucial for real-world AI deployments. However, mitigating visual biases is challenging due to their unexplainable nature, often identified indirectly through visualization or sample statistics, which necessitates additional human supervision for interpretation. To tackle this issue, we propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. Specifically, we extract common keywords from the captions of mispredicted images to identify potential biases in the model. We then validate these keywords by measuring their similarity to the mispredicted images using a vision-language scoring model. The keyword explanation form of visual bias offers several advantages, such as a clear group naming for bias discovery and a natural extension for debiasing using these group names. Our experiments demonstrate that B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. Additionally, B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet. For example, we discovered a contextual bias between "bee" and "flower" in ImageNet. We also highlight various applications of B2T keywords, including debiased training, CLIP prompting, and model comparison.(1)

关键词： bias and fairness explainable AI vision-language model

来源：评论

学校读者我要写书评

暂无评论

Beyond Image Super-Resolution for Image recognition with Task-Driven Perceptual Loss

Beyond Image Super-Resolution for Image Recognition with Tas...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Jaeha Oh, Junghun Lee, Kyoung Mu Seoul Natl Univ Dept ECE Seoul South Korea Seoul Natl Univ ASRI Seoul South Korea Seoul Natl Univ IPAI Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

In real-world scenarios, image recognition tasks, such as semantic segmentation and object detection, often pose greater challenges due to the lack of information available within low-resolution (LR) content. Image super-resolution (SR) is one of the promising solutions for addressing the challenges. However, due to the ill-posed property of SR, it is challenging for typical SR methods to restore task-relevant high-frequency contents, which may dilute the advantage of utilizing the SR method. Therefore, in this paper, we propose Super-Resolution for Image recognition (SR4IR) that effectively guides the generation of SR images beneficial to achieving satisfactory image recognition performance when processing LR images. The critical component of our SR4IR is the task-driven perceptual (TDP) loss that enables the SR network to acquire task-specific knowledge from a network tailored for a specific task. Moreover, we propose a cross-quality patch mix and an alternate training framework that significantly enhances the efficacy of the TDP loss by addressing potential problems when employing the TDP loss. Through extensive experiments, we demonstrate that our SR4IR achieves outstanding task performance by generating SR images useful for a specific image recognition task, including semantic segmentation, object detection, and image classification. The implementation code is available at https://***/JaehaKim97/SR4IR.

关键词： Low-level vision Perceptual loss Super-resolution Task-aware restoration

来源：评论

学校读者我要写书评

暂无评论

Area Under the ROC Curve Maximization for Metric Learning

Area Under the ROC Curve Maximization for Metric Learning

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Gajic, Bojana Amato, Ariel Baldrich, Ramon van de Weijer, Joost Gatta, Carlo Vintra Inc Barcelona Spain Comp Vis Ctr Barcelona Spain

ISBN: (纸本)9781665487399

Most popular metric learning losses have no direct relation with the evaluation metrics that are subsequently applied to evaluate their performance. We hypothesize that training a metric learning model by maximizing the area under the ROC curve (which is a typical performance measure of recognition systems) can induce an implicit ranking suitable for retrieval problems. This hypothesis is supported by previous work that proved that a curve dominates in ROC space if and only if it dominates in Precision-Recall space. To test this hypothesis, we design and maximize an approximated, derivable relaxation of the area under the ROC curve. The proposed AUC loss achieves state-of-the-art results on two large scale retrieval benchmark datasets (Stanford Online Products and DeepFashion In-Shop). Moreover, the AUC loss achieves comparable performance to more complex, domain specific, state-of-the-art methods for vehicle re-identification.

关键词： Training computer vision conferences Area measurement Benchmark testing pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion

Draw Step by Step: Reconstructing CAD Construction Sequences...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Ma, Weijian Chen, Shuaiqi Lou, Yunzhong Li, Xueyang Zhou, Xiangdong Fudan Univ Sch Comp Sci & Technol Shanghai Peoples R China

ISBN: (纸本)9798350353006

Reconstructing CAD construction sequences from raw 3D geometry serves as an interface between real-world objects and digital designs. In this paper, we propose CAD-Diffuser, a multimodal diffusion scheme aiming at integrating top-down design paradigm into generative reconstruction. In particular, we unify CAD point clouds and CAD construction sequences at the token level, guiding our proposed multimodal diffusion strategy to understand and link between the geometry and the design intent concentrated in construction sequences. Leveraging the strong decoding abilities of language models, the forward process is modeled as a random walk between the original token and the [MASK] token, while the reverse process naturally fits the masked token modeling scheme. A volume-based noise schedule is designed to encourage outline-first generation, decomposing the top-down design methodology into a machine-understandable procedure. For tokenizing CAD data of multiple modalities, we introduce a tokenizer with a self-supervised face segmentation task to compress local and global geometric information for CAD point clouds, and the CAD construction sequence is transformed into a primitive token string. Experimental results show that our CAD-Diffuser can perceive geometric details and the results are more likely to be reused by human designers.

关键词： computer-Aided Design Diffusion Models Point Cloud

来源：评论

学校读者我要写书评

暂无评论

Multi-Modal Hallucination Control by Visual Information Grounding

Multi-Modal Hallucination Control by Visual Information Grou...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Favero, Alessandro Zancato, Luca Trager, Matthew Choudhary, Siddharth Perera, Pramuditha Achille, Alessandro Swaminathan, Ashwin Soatto, Stefano AWS AI Labs Lausanne Switzerland

ISBN: (纸本)9798350353006

Generative vision-Language Models (VLMs) are prone to generate plausible-sounding textual answers that, however, are not always grounded in the input image. We investigate this phenomenon, usually referred to as "hallucination" and show that it stems from an excessive reliance on the language prior. In particular, we show that as more tokens are generated, the reliance on the visual prompt decreases, and this behavior strongly correlates with the emergence of hallucinations. To reduce hallucinations, we introduce Multi-Modal Mutual-Information Decoding (M3ID), a new sampling method for prompt amplification. M3ID amplifies the influence of the reference image over the language prior, hence favoring the generation of tokens with higher mutual information with the visual prompt. M3ID can be applied to any pre-trained autoregressive VLM at inference time without necessitating further training and with minimal computational overhead. If training is an option, we show that M3ID can be paired with Direct Preference Optimization ( DPO) to improve the model's reliance on the prompt image without requiring any labels. Our empirical findings show that our algorithms maintain the fluency and linguistic capabilities of pre-trained VLMs while reducing hallucinations by mitigating visually ungrounded answers. Specifically, for the LLaVA 13B model, M3ID and M3ID+DPO reduce the percentage of hallucinated objects in captioning tasks by 25% and 28%, respectively, and improve the accuracy on VQA benchmarks such as POPE by 21% and 24%.

关键词： language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 30 31 32 33 34 35 36 37 38 39 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：