检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

22,774 篇 会议
111 篇 期刊文献
23 册 图书

馆藏范围

22,907 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

13,400 篇 工学
- 10,880 篇 计算机科学与技术...
- 3,450 篇 软件工程
- 2,429 篇 机械工程
- 1,723 篇 光学工程
- 1,011 篇 控制科学与工程
- 998 篇 电气工程
- 761 篇 信息与通信工程
- 393 篇 仪器科学与技术
- 337 篇 生物工程
- 257 篇 生物医学工程（可授...
- 214 篇 电子科学与技术（可...
- 113 篇 化学工程与技术
- 112 篇 安全科学与工程
- 98 篇 测绘科学与技术
- 93 篇 交通运输工程
- 86 篇 建筑学
- 82 篇 土木工程
3,361 篇 医学
- 3,347 篇 临床医学
- 79 篇 基础医学(可授医学...
3,251 篇 理学
- 1,953 篇 物理学
- 1,665 篇 数学
- 567 篇 统计学（可授理学、...
- 484 篇 生物学
- 245 篇 系统科学
- 109 篇 化学
506 篇 管理学
- 299 篇 图书情报与档案管...
- 219 篇 管理科学与工程(可...
- 75 篇 工商管理
252 篇 艺术学
- 252 篇 设计学（可授艺术学...
62 篇 法学
- 59 篇 社会学
40 篇 农学
25 篇 教育学
19 篇 经济学
11 篇 军事学
3 篇 文学

主题

10,126 篇 computer vision
4,026 篇 pattern recognit...
2,900 篇 training
1,958 篇 computational mo...
1,792 篇 cameras
1,759 篇 visualization
1,484 篇 shape
1,466 篇 image segmentati...
1,445 篇 feature extracti...
1,412 篇 three-dimensiona...
1,288 篇 robustness
1,170 篇 computer archite...
1,146 篇 layout
1,142 篇 computer science
1,134 篇 semantics
1,071 篇 object detection
1,043 篇 conferences
1,009 篇 benchmark testin...
967 篇 codes
810 篇 face recognition

机构

135 篇 univ sci & techn...
118 篇 univ chinese aca...
118 篇 chinese univ hon...
110 篇 carnegie mellon ...
99 篇 tsinghua univers...
99 篇 microsoft resear...
94 篇 swiss fed inst t...
92 篇 zhejiang univ pe...
82 篇 university of sc...
81 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 university of ch...
72 篇 shanghai jiao to...
68 篇 microsoft res as...
65 篇 national laborat...
65 篇 alibaba grp peop...
63 篇 adobe research
63 篇 tsinghua univ pe...
60 篇 peking univ peop...
59 篇 peng cheng labor...

作者

78 篇 van gool luc
72 篇 timofte radu
63 篇 zhang lei
45 篇 luc van gool
40 篇 yang yi
37 篇 loy chen change
33 篇 xiaoou tang
33 篇 li stan z.
33 篇 qi tian
32 篇 sun jian
31 篇 liu yang
31 篇 li fei-fei
30 篇 chen chen
30 篇 tian qi
30 篇 pascal fua
29 篇 darrell trevor
28 篇 ying shan
27 篇 li xin
27 篇 vasconcelos nuno
27 篇 hanqing lu

语言

22,719 篇 英文
162 篇 其他
20 篇 中文
5 篇 土耳其文
2 篇 日文

检索条件"任意字段=1994 IEEE Computer-Society Conference on Computer Vision and Pattern Recognition"

共 22908 条记录，以下是331-340 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation

Boosting Object Detection with Zero-Shot Day-Night Domain Ad...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Du, Zhipeng Shi, Miaojing Deng, Jiankang Kings Coll London Dept Informat London England Tongji Univ Coll Elect & Informat Engn Shanghai Peoples R China Imperial Coll London Dept Comp London England Huawei London Res London England

ISBN: (纸本)9798350353006

Detecting objects in low-light scenarios presents a persistent challenge, as detectors trained on well-lit data exhibit significant performance degradation on low-light data due to low visibility. Previous methods mitigate this issue by exploring image enhancement or object detection techniques with real low-light image datasets. However, the progress is impeded by the inherent difficulties about collecting and annotating low-light images. To address this challenge, we propose to boost low-light object detection with zero-shot day-night domain adaptation, which aims to generalize a detector from well-lit scenarios to low-light ones without requiring real low-light data. Revisiting Retinex theory in the low-level vision, we first design a reflectance representation learning module to learn Retinex-based illumination invariance in images with a carefully designed illumination invariance reinforcement strategy. Next, an interchange-redecomposition-coherence procedure is introduced to improve over the vanilla Retinex image decomposition process by performing two sequential image decompositions and introducing a redecomposition cohering loss. Extensive experiments on ExDark, DARK FACE, and CODaN datasets show strong low-light generalizability of our method. Our code is available at https://***/ZPDu/DAI-Net.

关键词： Low-light vision Object Detction Zero-shot Domain Adaptation

来源：评论

学校读者我要写书评

暂无评论

LowRankOcc: Tensor Decomposition and Low-Rank Recovery for vision-based 3D Semantic Occupancy Prediction

LowRankOcc: Tensor Decomposition and Low-Rank Recovery for V...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhao, Linqing Xu, Xiuwei Wang, Ziwei Zhang, Yunpeng Zhang, Borui Zheng, Wenzhao Du, Dalong Zhou, Jie Lu, Jiwen Tsinghua Univ Dept Automat Beijing Peoples R China Tianjin Univ Sch Elect & Informat Engn Tianjin Peoples R China PhiGent Robot Beijing Peoples R China

ISBN: (纸本)9798350353006

In this paper, we present a tensor decomposition and low-rank recovery approach (LowRankOcc) for vision-based 3D semantic occupancy prediction. Conventional methods model outdoor scenes with fine-grained 3D grids, but the sparsity of non-empty voxels introduces considerable spatial redundancy, leading to potential overfitting risks. In contrast, our approach leverages the intrinsic low-rank property of 3D occupancy data, factorizing voxel representations into low-rank components to efficiently mitigate spatial redundancy without sacrificing performance. Specifically, we present the Vertical-Horizontal (VH) decomposition block factorizes 3D tensors into vertical vectors and horizontal matrices. With our "decomposition-encoding recovery" framework, we encode 3D contexts with only 1/2D convolutions and poolings, and subsequently recover the encoded compact yet informative context features back to voxel representations. Experimental results demonstrate that LowRankOcc achieves state-of-the-art performances in semantic scene completion on the SemanticKITTI dataset and 3D occupancy prediction on the nuScenes dataset.

关键词： 3D semantic occupancy tensor decomposition

来源：评论

学校读者我要写书评

暂无评论

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

ViP-LLaVA: Making Large Multimodal Models Understand Arbitra...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cai, Mu Liu, Haotian Mustikovela, Siva Karthik Meyer, Gregory P. Chai, Yuning Park, Dennis Lee, Yong Jae Univ Wisconsin Madison WI 53706 USA Cruise LLC San Francisco CA USA

ISBN: (纸本)9798350353006

While existing large vision-language multimodal models focus on whole image understanding, there is a prominent gap in achieving region-specific comprehension. Current approaches that use textual coordinates or spatial encodings often fail to provide a user-friendly interface for visual prompting. To address this challenge, we introduce a novel multimodal model capable of decoding arbitrary (free-form) visual prompts. This allows users to intuitively mark images and interact with the model using natural cues like a "red bounding box" or "pointed arrow". Our simple design directly overlays visual markers onto the RGB image, eliminating the need for complex region encodings, yet achieves state-of-the-art performance on region-understanding tasks like Visual7W, PointQA, and Visual Commonsense Reasoning benchmark. Furthermore, we present ViP-Bench, a comprehensive benchmark to assess the capability of models in understanding visual prompts across multiple dimensions, enabling future research in this domain. Code, data, and model are publicly available.

关键词： Large Language Models Large Multimodal Models Multimodal Benchmark Region-level Understanding vision-language models Visual Commonsense Reasoning Visual Prompts

来源：评论

学校读者我要写书评

暂无评论

Sharingan: A Transformer Architecture for Multi-Person Gaze Following

Sharingan: A Transformer Architecture for Multi-Person Gaze ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Tafasca, Samy Gupta, Anshul Odobez, Jean-Marc Idiap Res Inst Martigny Switzerland Ecole Polytech Fed Lausanne Lausanne Switzerland

ISBN: (纸本)9798350353013;9798350353006

Gaze is a powerful form of non-verbal communication that humans develop from an early age. As such, modeling this behavior is an important task that can benefit a broad set of application domains ranging from robotics to sociology. In particular, the gaze following task in computer vision is defined as the prediction of the 2D pixel coordinates where a person in the image is looking. Previous attempts in this area have primarily centered on CNN-based architectures, but they have been constrained by the need to process one person at a time, which proves to be highly inefficient. In this paper, we introduce a novel and effective multi-person transformer-based architecture for gaze prediction. While there exist prior works using transformers for multi-person gaze prediction [38, 39], they use a fixed set of learnable embeddings to decode both the person and its gaze target, which requires a matching step afterward to link the predictions with the annotations. Thus, it is difficult to quantitatively evaluate these methods reliably with the available benchmarks, or integrate them into a larger human behavior understanding system. Instead, we are the first to propose a multi-person transformer-based architecture that maintains the original task formulation and ensures control over the people fed as input. Our main contribution lies in encoding the person-specific information into a single controlled token to be processed alongside image tokens and using its output for prediction based on a novel multiscale decoding mechanism. Our new architecture achieves state-of-the-art results on the GazeFollow, VideoAttentionTarget, and ChildPlay datasets and outper-forms comparable multi-person architectures with a notable margin. Our code, checkpoints, and data extractions will be made publicly available soon.

关键词： computer vision deep learning gaze following

来源：评论

学校读者我要写书评

暂无评论

Forecasting of 3D Whole-body Human Poses with Grasping Objects

Forecasting of 3D Whole-body Human Poses with Grasping Objec...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yan, Haitao Cui, Qiongjie Xie, Jiexin Guo, Shijie Fudan Univ Acad Engn & Technol Shanghai Peoples R China Nanjing Univ Sci & Technol Nanjing Peoples R China

ISBN: (纸本)9798350353013;9798350353006

In the context of computer vision and human-robot interaction, forecasting 3D human poses is crucial for understanding human behavior and enhancing the predictive capabilities of intelligent systems. While existing methods have made significant progress, they often focus on predicting major body joints, overlooking fine-grained gestures and their interaction with objects. Human hand movements, particularly during object interactions, play a pivotal role and provide more precise expressions of human poses. This work fills this gap and introduces a novel paradigm: forecasting 3D whole-body human poses with a focus on grasping objects. This task involves predicting activities across all joints in the body and hands, encompassing the complexities of internal heterogeneity and external interactivity. To tackle these challenges, we also propose a novel approach: C3HOST, cross-context cross-modal consolidation for 3D whole-body pose forecasting, effectively handles the complexities of internal heterogeneity and external interactivity. C3HOST involves distinct steps, including the heterogeneous content encoding and alignment, and cross-modal feature learning and interaction. These enable us to predict activities across all body and hand joints, ensuring high-precision whole-body human pose prediction, even during object grasping. Extensive experiments on two benchmarks demonstrate that our model significantly enhances the accuracy of whole-body human motion prediction. The project page is available at https://***/view/c3host.

关键词： 3D computer vision cross-modal learning human motion analysis human motion prediction

来源：评论

学校读者我要写书评

暂无评论

GenZI: Zero-Shot 3D Human-Scene Interaction Generation

GenZI: Zero-Shot 3D Human-Scene Interaction Generation

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Lei Dai, Angela Tech Univ Munich Munich Germany

ISBN: (纸本)9798350353006

Can we synthesize 3D humans interacting with scenes without learning from any 3D human-scene interaction data? We propose GenZI(1), the first zero-shot approach to generating 3D human-scene interactions. Key to GenZI is our distillation of interaction priors from large vision-language models (VLMs), which have learned a rich semantic space of 2D human-scene compositions. Given a natural language description and a coarse point location of the desired interaction in a 3D scene, we first leverage VLMs to imagine plausible 2D human interactions inpainted into multiple rendered views of the scene. We then formulate a robust iterative optimization to synthesize the pose and shape of a 3D human model in the scene, guided by consistency with the 2D interaction hypotheses. In contrast to existing learning-based approaches, GenZI circumvents the conventional need for captured 3D interaction data, and allows for flexible control of the 3D interaction synthesis with easy-to-use text prompts. Extensive experiments show that our zero-shot approach has high flexibility and generality, making it applicable to diverse scene types, including both indoor and outdoor environments.

关键词： Human-Scene Interaction vision-Language Models Zero-Shot

来源：评论

学校读者我要写书评

暂无评论

SyncMask: Synchronized Attentional Masking for Fashion-centric vision-Language Pretraining

SyncMask: Synchronized Attentional Masking for Fashion-centr...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Song, Chull Hwan Hwang, Taebaek Yoon, Jooyoung Choi, Shunghyun Gu, Yeong Hyeon Dealicious Inc Seoul South Korea Sejong Univ Seoul South Korea

ISBN: (纸本)9798350353006

vision-language models (VLMs) have made significant strides in cross-modal understanding through large-scale paired datasets. However, in fashion domain, datasets of-en exhibit a disparity between the information conveyed in image and text. This issue stems from datasets containing multiple images of a single fashion item all paired with one text, leading to cases where some textual details are not visible in individual images. This mismatch, particularly when non-co-occurring elements are masked, undermines the training of conventional VLM objectives like Masked Language Modeling and Masked Image Modeling, thereby hindering the model's ability to accurately align fine-grained visual and textual features. Addressing this problem, we pro-pose Synchronized attentional Masking (SyncMask), which generate masks that pinpoint the image patches and word tokens where the information co-occur in both image and text. This synchronization is accomplished by harnessing cross-attentional features obtained from a momentum model, ensuring a precise alignment between the two modalities. Additionally, we enhance grouped batch sampling with semi-hard negatives, effectively mitigating false negative issues in Image-Text Matching and Image-Text Contrastive learning objectives within fashion datasets. Our experiments demonstrate the effectiveness of the proposed approach, outperforming existing methods in three downstream tasks.

关键词： Visual languages

来源：评论

学校读者我要写书评

暂无评论

Beyond Image Super-Resolution for Image recognition with Task-Driven Perceptual Loss

Beyond Image Super-Resolution for Image Recognition with Tas...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kim, Jaeha Oh, Junghun Lee, Kyoung Mu Seoul Natl Univ Dept ECE Seoul South Korea Seoul Natl Univ ASRI Seoul South Korea Seoul Natl Univ IPAI Seoul South Korea

ISBN: (纸本)9798350353013;9798350353006

In real-world scenarios, image recognition tasks, such as semantic segmentation and object detection, often pose greater challenges due to the lack of information available within low-resolution (LR) content. Image super-resolution (SR) is one of the promising solutions for addressing the challenges. However, due to the ill-posed property of SR, it is challenging for typical SR methods to restore task-relevant high-frequency contents, which may dilute the advantage of utilizing the SR method. Therefore, in this paper, we propose Super-Resolution for Image recognition (SR4IR) that effectively guides the generation of SR images beneficial to achieving satisfactory image recognition performance when processing LR images. The critical component of our SR4IR is the task-driven perceptual (TDP) loss that enables the SR network to acquire task-specific knowledge from a network tailored for a specific task. Moreover, we propose a cross-quality patch mix and an alternate training framework that significantly enhances the efficacy of the TDP loss by addressing potential problems when employing the TDP loss. Through extensive experiments, we demonstrate that our SR4IR achieves outstanding task performance by generating SR images useful for a specific image recognition task, including semantic segmentation, object detection, and image classification. The implementation code is available at https://***/JaehaKim97/SR4IR.

关键词： Low-level vision Perceptual loss Super-resolution Task-aware restoration

来源：评论

学校读者我要写书评

暂无评论

Grounding Everything: Emerging Localization Properties in vision-Language Transformers

Grounding Everything: Emerging Localization Properties in Vi...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Bousselham, Walid Petersen, Felix Ferrari, Vittorio Kuehne, Hilde Univ Bonn Bonn Germany Goethe Univ Frankfurt Frankfurt Germany Stanford Univ Stanford CA 94305 USA Synthesia Io London England MIT IBM Watson AI Lab Cambridge MA USA

ISBN: (纸本)9798350353013;9798350353006

vision-language foundation models have shown remarkable performance in various zero-shot settings such as image retrieval, classification, or captioning. But so far, those models seem to fall behind when it comes to zero-shot localization of referential expressions and objects in images. As a result, they need to be fine-tuned for this task. In this paper, we show that pretrained vision-language (VL) models allow for zero-shot open-vocabulary object localization without any fine-tuning. To leverage those capabilities, we propose a Grounding Everything Module (GEM) that generalizes the idea of value-value attention introduced by CLIPSurgery [17] to a self-self attention path. We show that the concept of self-self attention corresponds to clustering, thus enforcing groups of tokens arising from the same object to be similar while preserving the alignment with the language space. To further guide the group formation, we propose a set of regularizations that allows the model to finally generalize across datasets and backbones. We evaluate the proposed GEM framework on various benchmark tasks and datasets for semantic segmentation. GEM not only outperforms other training-free open-vocabulary localization methods, but also achieves state-of-the-art results on the recently proposed OpenImagesV7 large-scale segmentation benchmark. (1)

关键词： CLIP open-vocabulary zero-shot segmentation vision-language model

来源：评论

学校读者我要写书评

暂无评论

Logarithmic Lenses: Exploring Log RGB Data for Image Classification

Logarithmic Lenses: Exploring Log RGB Data for Image Classif...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Maxwell, Bruce A. Singhania, Sumegha Patel, Avnish Kumar, Rahul Fryling, Heather Li, Sihan Sun, Haonan He, Ping Li, Zewen Northeastern Univ Boston MA 02115 USA

ISBN: (纸本)9798350353006

The design of deep network architectures and training methods in computer vision has been well-explored. However, in almost all cases the images have been used as provided, with little exploration of pre-processing steps beyond normalization and data augmentation. Virtually all images posted on the web or captured by devices are processed for viewing by humans. Is the pipeline used for humans also best for use by computers and deep networks? The human visual system uses logarithmic sensors;differences and sums correspond to ratios and products. Fea-tures in log space will be invariant to intensity changes and robust to color balance changes. Log RGB space also reveals structure that is corrupted by typical pre-processing. We explore using linear and log RGB data for training standard backbone architectures on an image classification task using data derived directly from RAW images to guarantee its integrity. We found that networks trained on log RGB data exhibit improved performance on an unmodified test set and invariance to intensity and color balance modifications without additional training or data augmentation. Furthermore, we found that the gains from using high quality log data could also be partially or fully realized from data in 8-bit sRGB-JPG format by inverting the sRGB transform and taking the log. These results imply existing databases may benefit from this type of pre-processing. While working with log data, we found it was critical to retain the integrity of the log relationships and that networks using log data train best with meta-parameters different than those used for sRGB or linear data. Finally, we introduce a new 10-category 10k RAW image data set (RAW10) for image classification and other purposes to enable further the exploration of log RGB as an input format for deep networks in computer vision.

关键词： computer vision data set image classification physics-based vision

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 30 31 32 33 34 35 36 37 38 39 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：