检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

11,885 篇 会议
5 篇 期刊文献

馆藏范围

11,890 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,059 篇 工学
- 7,617 篇 计算机科学与技术...
- 796 篇 机械工程
- 688 篇 电气工程
- 360 篇 软件工程
- 228 篇 控制科学与工程
- 40 篇 光学工程
- 19 篇 生物工程
- 17 篇 信息与通信工程
- 12 篇 生物医学工程（可授...
- 6 篇 电子科学与技术（可...
- 6 篇 建筑学
- 6 篇 交通运输工程
- 5 篇 仪器科学与技术
- 5 篇 化学工程与技术
- 5 篇 安全科学与工程
- 4 篇 土木工程
3,347 篇 医学
- 3,346 篇 临床医学
- 4 篇 基础医学(可授医学...
- 4 篇 公共卫生与预防医...
253 篇 理学
- 198 篇 系统科学
- 32 篇 物理学
- 21 篇 生物学
- 18 篇 数学
- 9 篇 统计学（可授理学、...
- 7 篇 化学
17 篇 管理学
- 12 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
3 篇 法学
- 3 篇 社会学
3 篇 教育学
- 3 篇 教育学
2 篇 农学
1 篇 经济学
1 篇 军事学

主题

5,633 篇 computer vision
2,668 篇 training
2,203 篇 pattern recognit...
1,747 篇 computational mo...
1,502 篇 visualization
1,360 篇 three-dimensiona...
1,074 篇 semantics
999 篇 benchmark testin...
986 篇 codes
959 篇 computer archite...
891 篇 deep learning
777 篇 conferences
754 篇 task analysis
700 篇 feature extracti...
561 篇 transformers
533 篇 face recognition
527 篇 neural networks
495 篇 object detection
490 篇 image segmentati...
468 篇 cameras

机构

174 篇 univ sci & techn...
145 篇 carnegie mellon ...
144 篇 univ chinese aca...
144 篇 tsinghua univ pe...
134 篇 chinese univ hon...
110 篇 zhejiang univ pe...
109 篇 peng cheng lab p...
99 篇 swiss fed inst t...
91 篇 tsinghua univers...
90 篇 shanghai ai lab ...
87 篇 sensetime res pe...
86 篇 shanghai jiao to...
83 篇 zhejiang univers...
82 篇 tech univ munich...
79 篇 university of sc...
79 篇 stanford univ st...
78 篇 univ hong kong p...
77 篇 australian natl ...
76 篇 alibaba grp peop...
75 篇 peng cheng labor...

作者

75 篇 timofte radu
64 篇 van gool luc
50 篇 zhang lei
43 篇 yang yi
37 篇 loy chen change
36 篇 tao dacheng
32 篇 zhou jie
31 篇 chen chen
30 篇 liu yang
30 篇 tian qi
29 篇 sun jian
29 篇 zha zheng-jun
28 篇 li xin
27 篇 qi tian
26 篇 vasconcelos nuno
25 篇 liu xiaoming
25 篇 darrell trevor
24 篇 zheng wei-shi
24 篇 luo ping
24 篇 ying shan

语言

11,863 篇 英文
26 篇 其他
1 篇 中文

检索条件"任意字段=2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024"

共 11890 条记录，以下是561-570 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

ALINA: Advanced Line Identification and Notation Algorithm

ALINA: Advanced Line Identification and Notation Algorithm

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Khan, Mohammed Abdul Hafeez Ganeriwala, Parth Bhattacharyya, Siddhartha Neogi, Natasha Muthalagu, Raja Florida Inst Technol Melbourne FL 32901 USA NASA Langley Res Ctr Hampton VA 23665 USA BITS Pilani Dubai Campus Dubai U Arab Emirates

ISBN: (纸本)9798350365474

Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annotation framework, Advanced Line Identification and Notation Algorithm (ALINA), which can be used for labeling taxiway datasets that consist of different camera perspectives and variable weather attributes (sunny and cloudy). Additionally, the CIRCular threshoLd pixEl Discovery And Traversal (CIRCLEDAT) algorithm has been proposed, which is an integral step in determining the pixels corresponding to taxiway line markings. Once the pixels are identified, ALINA generates corresponding pixel coordinate annotations on the frame. Using this approach, 60,249 frames from the taxiway dataset, AssistTaxi have been labeled. To evaluate the performance, a context-based edge map (CBEM) set was generated manually based on edge features and connectivity. The detection rate after testing the annotated labels with the CBEM set was recorded as 98.45%, attesting its dependability and effectiveness.

关键词： aircraft perception annotation autonomous driving computer vision labeling line identification taxiway data

来源：评论

学校读者我要写书评

暂无评论

GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture recognition

GestFormer: Multiscale Wavelet Pooling Transformer Network f...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Garg, Mallika Ghosh, Debashis Pradhan, Pyari Mohan Indian Inst Technol Dept Elect & Commun Engn Roorkee Uttar Pradesh India

ISBN: (纸本)9798350365474

Transformer model have achieved state-of-the-art results in many applications like NLP, classification, etc. But their exploration in gesture recognition task is still limited. So, we propose a novel GestFormer architecture for dynamic hand gesture recognition. The motivation behind this design is to propose a resource efficient transformer model, since transformers are computationally expensive and very complex. So, we propose to use a pooling based token mixer named PoolFormer, since it uses only pooling layer which is a non-parametric layer instead of quadratic attention. The proposed model also leverages the space-invariant features of the wavelet transform and also the multiscale features are selected using multi-scale pooling. Further, a gated mechanism helps to focus on fine details of the gesture with the contextual information. This enhances the performance of the proposed model compared to the traditional transformer with fewer parameters, when evaluated on dynamic hand gesture datasets, NVidia Dynamic Hand Gesture and Briareo datasets. To prove the efficacy of the proposed model, we have experimented on single as well multimodal inputs such as infrared, normals, depth, optical flow and color images. We have also compared the proposed GestFormer in terms of resource efficiency and number of operations. The source code is available at https://***/mallikagarg/GestFormer.

关键词： Dynamic Gesture recognition Gated Feedforward network multi-scale pooling Multiscale Wavelet Pooling PoolFormer

来源：评论

学校读者我要写书评

暂无评论

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment fro...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yu, Tianyu Yao, Yuan Zhang, Haoye He, Taiwen Han, Yifeng Cui, Ganqu Hu, Jinyi Liu, Zhiyuan Zheng, Hai-Tao Sun, Maosong Tsinghua Univ Beijing Peoples R China Natl Univ Singapore Singapore Singapore Tsinghua Univ Shenzhen Int Grad Sch Beijing Peoples R China Pengcheng Lab Shenzhen Peoples R China

ISBN: (纸本)9798350353006

Multimodal Large Language Models (MLLMs) have recently demonstrated impressive capabilities in multimodal understanding, reasoning, and interaction. However, existing MLLMs prevalently suffer from serious hallucination problems, generating text that is not factually grounded in associated images. The problem makes existing MLLMs untrustworthy and thus impractical in real-world (especially high-stakes) applications. To address the challenge, we present RLHF-V, which enhances MLLM trustworthiness via behavior alignment from fine-grained correctional human feedback. Specifically, RLHF-V collects human preference in the form of segment-level corrections on hallucinations, and performs dense direct preference optimization over the human feedback. Comprehensive experiments on five benchmarks in both automatic and human evaluation show that, RLHF-V can enable substantially more trustworthy MLLM behaviors with promising data and computation efficiency. Remarkably, using 1.4k annotated data samples, RLHF-V significantly reduces the hallucination rate of the base MLLM by 34.8%, outperforming the concurrent LLaVA-RLHF trained on 10k annotated data. The final model achieves state-of-the-art performance in trustworthiness among open-source MLLMs, and shows better ro-bustness than GPT-4V in preventing hallucinations aroused from over-generalization.

关键词： hallucination language reasoning vision

来源：评论

学校读者我要写书评

暂无评论

Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

Theoretically Achieving Continuous Representation of Oriente...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Xiao, Zi-Kai Yang, Guo-Ye Yang, Xue Mul, Tai-Jiang Yan, Junchi Hui, Shi-Min Tsinghua Univ Dept Comp Sci & Technol Beijing Peoples R China Shanghai Jiao Tong Univ Dept CSE Shanghai Peoples R China Shanghai Jiao Tong Univ MoE Key Lab AI Shanghai Peoples R China

ISBN: (纸本)9798350353006

Considerable efforts have been devoted to Oriented Object Detection (OOD). However, one lasting issue regarding the discontinuity in Oriented Bounding Box (OBB) representation remains unresolved, which is an inherent bottleneck for extant OOD methods. This paper endeavors to completely solve this issue in a theoretically guaranteed manner and puts an end to the ad-hoc efforts in this direction. Prior studies typically can only address one of the two cases of discontinuity: rotation and aspect ratio, and often inadvertently introduce decoding discontinuity, e.g. Decoding Incompleteness (DI) and Decoding Ambiguity (DA) as discussed in literature. Specifically, we propose a novel representation method called Continuous OBB (COBB), which can be readily integrated into existing detectors e.g. Faster-RCNN as a plugin. It can theoretically ensure continuity in bounding box regression which to our best knowledge, has not been achieved in literature for rectangle-based object representation. For fairness and transparency of experiments, we have developed a modularized benchmark based on the open-source deep learning framework Jittor's detection toolbox JDet for OOD evaluation. On the popular DOTA dataset, by integrating Faster-RCNN as the same baseline model, our new method outperforms the peer method Gliding Vertex by 1.13% mAP(50) (relative improvement 1.54%), and 2.46% mAP(75) (relative improvement 5.91%), without any tricks.

关键词： boundary problem computer vision object detection oriented object detection remote sensing

来源：评论

学校读者我要写书评

暂无评论

One Embedding to Predict Them All: Visible and Thermal Universal Face Representations for Soft Biometric Estimation via vision Transformers

One Embedding to Predict Them All: Visible and Thermal Unive...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Mirabet-Herranz, Nelida Galdi, Chiara Dugelay, Jean-Luc EURECOM Campus SophiaTech450 Route Chappes F-06410 Biot France

ISBN: (纸本)9798350365474

Human faces encode a vast amount of information including not only uniquely distinctive features of the individual but also demographic information such as a person's age, gender, and weight. Such information is referred to as soft-biometrics, which are physical, behavioral or adhered human characteristics, classifiable in pre-defined human compliant categories. As we often say 'one look is worth a thousand words'. vision Transformers have emerged as a powerful deep learning architecture able to achieve accurate classifications for different computer vision tasks, but these models have not been yet applied to soft-biometrics. In this work, we propose the Bidirectional Encoder Face representation from image Transformers (BEFiT), a model that leverages the multi-attention mechanisms to capture local and global features and produce a multi-purpose face embedding. This unique embedding enables the estimation of different demographics without having to re-train the model for each soft-biometric trait, ensuring high efficiency without compromising accuracy. Our approach explores the use of visible and thermal images to achieve powerful face embedding in different light spectra. We demonstrate that the BEFiT embeddings can capture essential information for gender, age, and weight estimation, surpassing the performance of dedicated deep learning structures for the estimation of a single soft biometric trait. The code of BEFiT implementation is publicly available(1)

关键词： Biometrics

来源：评论

学校读者我要写书评

暂无评论

Interpreting COVID Lateral Flow Tests' Results with Foundation Models

Interpreting COVID Lateral Flow Tests' Results with Foundati...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Pandey, Stuti Myers-Dean, Josh Reynolds, Jarek Gurari, Danna Univ Colorado Boulder CO 80309 USA Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9798350365474

Lateral flow tests (LFTs) enable rapid, low-cost testing for health conditions including Covid, pregnancy, HIV, and malaria. Automated readers of LFT results can yield many benefits including empowering blind people to independently learn about their health and accelerating data entry for large-scale monitoring (e.g., for pandemics such as Covid) by using only a single photograph per LFT test. Accordingly, we explore the abilities of modern foundation vision language models (VLMs) in interpreting such tests. To enable this analysis, we first create a new labeled dataset with hierarchical segmentations of each LFT test and its nested test result window. We call this dataset LFT-Grounding. Next, we benchmark eight modern VLMs in zero-shot settings for analyzing these images. We demonstrate that current VLMs frequently fail to correctly identify the type of LFT test, interpret the test results, locate the nested result window of the LFT tests, and recognize LFT tests when they partially obfuscated. To facilitate community-wide progress towards automated LFT reading, we publicly release our dataset at https://***/ lft_grounding_foundation_models/

关键词： Accessibility Foundation vision Language Models Lateral Flow Test Prompt Engineering Zero-Shot

来源：评论

学校读者我要写书评

暂无评论

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

Generalizable Whole Slide Image Classification with Fine-Gra...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Li, Hao Chen, Ying Chen, Yifei Yu, Rongshan Yang, Wenxian Wang, Liansheng Ding, Bowen Han, Yuchen Xiamen Univ Sch Informat Xiamen Peoples R China Huawei Xiamen Peoples R China Aginome Sci Xiamen Peoples R China Shanghai Jiao Tong Univ Shanghai Chest Hosp Dept Pathol Sch Med Shanghai Peoples R China

ISBN: (纸本)9798350353006

Whole Slide Image (WSI) classification is often formulated as a Multiple Instance Learning (MIL) problem. Recently, vision-Language Models (VLMs) have demonstrated remarkable performance in WSI classification. However, existing methods leverage coarse-grained pathogenetic descriptions for visual representation supervision, which are insufficient to capture the complex visual appearance of pathogenetic images, hindering the generalizability of models on diverse downstream tasks. Additionally, processing high-resolution WSIs can be computationally expensive. In this paper, we propose a novel "Fine-grained Visual-Semantic Interaction" (FiVE) framework for WSI classification. It is designed to enhance the model's generalizability by leveraging the interaction between localized visual patterns and fine-grained pathological semantics. Specifically, with meticulously designed queries, we start by utilizing a large language model to extract fine-grained pathological descriptions from various non-standardized raw reports. The output descriptions are then reconstructed into fine-grained labels used for training. By introducing a Task-specific Fine-grained Semantics (TFS) module, we enable prompts to capture crucial visual information in WSIs, which enhances representation learning and augments generalization capabilities significantly. Furthermore, given that pathological visual patterns are redundantly distributed across tissue slices, we sample a subset of visual instances during training. Our method demonstrates robust generalizability and strong transferability, dominantly outperforming the counterparts on the TCGA Lung Cancer dataset with at least 9.19% higher accuracy in few-shot experiments. The code is available at: https://***/ls1rius/WSI FiVE.

关键词： Fine-Grained Generalizable vision-Language-Model Visual-Semantic Whole Slide Image

来源：评论

学校读者我要写书评

暂无评论

BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Representation in vision-based Roadside 3D Object Detection

BEVSpread: Spread Voxel Pooling for Bird's-Eye-View Represen...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Wang, Wenjie Lu, Yehao Zheng, Guangcong Zhan, Shuigen Ye, Xiaoqing Tan, Zichang Wang, Jingdong Wang, Gaoang Li, Xi Zhejiang Univ Coll Comp Sci & Technol Hangzhou Zhejiang Peoples R China Zhejiang Univ Polytech Inst Hangzhou Zhejiang Peoples R China Baidu Beijing Peoples R China Zhejiang Singapore Innovat & AI Joint Res Lab Singapore Singapore

ISBN: (纸本)9798350353006

vision-based roadside 3D object detection has attracted rising attention in autonomous driving domain, since it encompasses inherent advantages in reducing blind spots and expanding perception range. While previous work mainly focuses on accurately estimating depth or height for 2D- to-3D mapping, ignoring the position approximation error in the voxel pooling process. Inspired by this insight, we propose a novel voxel pooling strategy to reduce such error, dubbed BEVSpread. Specifically, instead of bringing the image features contained in a frustum point to a single BEV grid, BEVSpread considers each frustum point as a source and spreads the image features to the surrounding BEV grids with adaptive weights. To achieve superior propagation performance, a specific weight function is designed to dynamically control the decay speed of the weights according to distance and depth. Aided by customized CUDA parallel acceleration, BEVSpread achieves comparable inference time as the original voxel pooling. Extensive experiments on two large-scale roadside benchmarks demonstrate that, as a plug-in, BEVSpread can significantly improve the performance of existing frustum-based BEV methods by a large margin of (1.12, 5.26, 3.01) AP in vehicle, pedestrian and cyclist. The source code will be made publicly available at BEVSpread.

关键词： 3D Object Detection Autonomous Driving BEV

来源：评论

学校读者我要写书评

暂无评论

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Und...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Chen, Sijin Chen, Xin Zhang, Chi Li, Mingsheng Yu, Gang Fei, Hao Zhu, Hongyuan Fan, Jiayuan Chen, Tao Fudan Univ Shanghai Peoples R China Tencent PCG Shenzhen Peoples R China Natl Univ Singapore Singapore Singapore ASTAR Inst InfoComm Res I2R Singapore Singapore ASTAR Ctr Frontier AI Res CFAR Singapore Singapore

ISBN: (纸本)9798350353006

Recent progress in Large Multimodal Models (LMM) has opened up great possibilities for various applications in the field of human-machine interactions. However, developing LMMs that can comprehend, reason, and plan in complex and diverse 3D environments remains a challenging topic, especially considering the demand for understanding permutation-invariant point cloud representations of the 3D scene. Existing works seek help from multi-view images by projecting 2D features to 3D space, which inevitably leads to huge computational overhead and performance degradation. In this paper, we present LL3DA, a Large Language 3D Assistant that takes point cloud as the direct input and responds to both text instructions and visual interactions. The additional visual interaction enables LMMs to better comprehend human interactions with the 3D environment and further remove the ambiguities within plain texts. Experiments show that LL3DA achieves remarkable results and surpasses various 3D vision-language models on both 3D Dense Captioning and 3D Question Answering.

关键词： large language models Multi-modal learning vision and language

来源：评论

学校读者我要写书评

暂无评论

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Embodied Multi-Modal Agent trained by an LLM from a Parallel...

引用

ieee/cvf conference on computer vision and pattern recognition (cvpr)

作者： Yang, Yijun Zhou, Tianyi Li, Kanxue Tao, Dapeng Li, Lusong Shen, Li He, Xiaodong Jiang, Jing Shi, Yuhui Southern Univ Sci & Technol Shenzhen Peoples R China Univ Maryland College Pk MD 20742 USA Yunnan Univ Kunming Yunnan Peoples R China JD Explore Acad Beijing Peoples R China Univ Technol Sydney Sydney NSW Australia

ISBN: (纸本)9798350353006

While large language models (LLMs) excel in a simulated world of texts, they struggle to interact with the more realistic world without perceptions of other modalities such as visual or audio signals. Although vision-language models (VLMs) integrate LLM modules (1) aligned with static image features, and (2) may possess prior knowledge of world dynamics (as demonstrated in the text world), they have not been trained in an embodied visual world and thus cannot align with its dynamics. On the other hand, training an embodied agent in a noisy visual world without expert guidance is often challenging and inefficient. In this paper, we train a VLM agent living in a visual world using an LLM agent excelling in a parallel text world. Specifically, we distill LLM's reflection outcomes (improved actions by analyzing mistakes) in a text world's tasks to finetune the VLM on the same tasks of the visual world, resulting in an Embodied Multi-Modal Agent (EMMA) quickly adapting to the visual world dynamics. Such cross-modality imitation learning between the two parallel worlds is achieved by a novel DAgger-DPO algorithm, enabling EMMA to generalize to a broad scope of new tasks without any further guidance from the LLM expert. Extensive evaluations on the ALFWorld benchmark's diverse tasks highlight EMMA's superior performance to SOTA VLM-based agents, e.g., 20%-70% improvement in the success rate.

关键词： Embodied AI Imitation Learning Large Language Models vision Language Models

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 53 54 55 56 57 58 59 60 61 62 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：