检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

12,844 篇 会议
13 篇 期刊文献
2 册 图书

馆藏范围

12,859 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

7,573 篇 工学
- 6,863 篇 计算机科学与技术...
- 880 篇 机械工程
- 814 篇 软件工程
- 435 篇 控制科学与工程
- 360 篇 光学工程
- 306 篇 电气工程
- 209 篇 仪器科学与技术
- 124 篇 信息与通信工程
- 91 篇 生物工程
- 62 篇 生物医学工程（可授...
- 39 篇 电子科学与技术（可...
- 34 篇 安全科学与工程
- 26 篇 化学工程与技术
- 21 篇 交通运输工程
- 20 篇 建筑学
- 18 篇 土木工程
2,957 篇 医学
- 2,956 篇 临床医学
- 15 篇 基础医学(可授医学...
- 12 篇 药学(可授医学、理...
700 篇 理学
- 359 篇 物理学
- 225 篇 数学
- 175 篇 系统科学
- 95 篇 统计学（可授理学、...
- 93 篇 生物学
- 22 篇 化学
201 篇 艺术学
- 201 篇 设计学（可授艺术学...
84 篇 管理学
- 59 篇 图书情报与档案管...
- 25 篇 管理科学与工程(可...
- 14 篇 工商管理
23 篇 法学
- 21 篇 社会学
5 篇 农学
4 篇 教育学
2 篇 经济学
1 篇 军事学

主题

6,464 篇 computer vision
2,688 篇 training
2,437 篇 pattern recognit...
1,780 篇 computational mo...
1,522 篇 visualization
1,348 篇 three-dimensiona...
1,091 篇 computer archite...
1,063 篇 semantics
997 篇 benchmark testin...
976 篇 codes
970 篇 conferences
854 篇 feature extracti...
830 篇 cameras
771 篇 task analysis
707 篇 deep learning
646 篇 image segmentati...
611 篇 object detection
595 篇 shape
554 篇 transformers
538 篇 neural networks

机构

132 篇 univ sci & techn...
122 篇 carnegie mellon ...
120 篇 tsinghua univ pe...
114 篇 univ chinese aca...
113 篇 chinese univ hon...
94 篇 tsinghua univers...
91 篇 zhejiang univ pe...
91 篇 swiss fed inst t...
85 篇 peng cheng lab p...
81 篇 university of ch...
80 篇 zhejiang univers...
77 篇 shanghai ai lab ...
77 篇 peng cheng labor...
75 篇 university of sc...
69 篇 shanghai jiao to...
68 篇 shanghai jiao to...
67 篇 alibaba grp peop...
67 篇 stanford univ st...
66 篇 univ hong kong p...
64 篇 sensetime res pe...

作者

77 篇 timofte radu
63 篇 van gool luc
45 篇 zhang lei
36 篇 yang yi
36 篇 luc van gool
34 篇 tao dacheng
31 篇 loy chen change
29 篇 chen chen
28 篇 sun jian
28 篇 qi tian
25 篇 li xin
24 篇 liu yang
24 篇 tian qi
24 篇 ying shan
23 篇 wang xinchao
23 篇 zha zheng-jun
23 篇 boxin shi
21 篇 zhou jie
21 篇 vasconcelos nuno
20 篇 luo ping

语言

12,849 篇 英文
9 篇 其他
1 篇 中文

检索条件"任意字段=IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops"

共 12859 条记录，以下是81-90 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Content-aware Input Scaling and Deep Learning Computation Offloading for Low-Latency Embedded vision

Content-aware Input Scaling and Deep Learning Computation Of...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Prabhune, Omkar Chen, Tianen Kim, Younghyun Purdue Univ W Lafayette IN 47907 USA Univ Wisconsin Madison WI 53706 USA Google Mountain View CA 94043 USA

ISBN: (纸本)9798350365474

Deploying deep learning (DL) models for visual recognition on embedded systems is often constrained by their limited compute power and storage capacity, and has stringent latency and power requirements. As emerging DL applications continue to evolve, they place increasing demands on computational resources that embedded vision systems are unable to provision. One promising solution to overcome these limitations is computation offloading. However, for performance improvements to be realized, it is essential to carefully partition tasks, taking into account both the quality of the data and the communication overhead. In this paper, we introduce a novel framework for content-aware offloading of DL computations, aimed at maximizing quality-of-service while adhering to latency constraints. Our proposed framework involves the embedded vision system/edge device intelligently compressing data in a contentaware manner using a lightweight model and transmitting it to a more powerful server. The framework consists of two key components: offline training for efficient content-aware data scaling and online control that adapts to the network variations in real-time. To illustrate the effectiveness of our approach, we apply it to multiple downstream tasks such as face identification, person keypoint detection, and instance segmentation, showcasing a significant enhancement in the overall quality of results for various applications.

关键词： AI on edge computation offloading content-aware compression content-aware offloading deep learning edge computing edge ML efficient deep learning efficient ML embedded vision face identification low-latency deep learning low-power ML ml system tinyML

来源：评论

学校读者我要写书评

暂无评论

Block Selective Reprogramming for On-device Training of vision Transformers

Block Selective Reprogramming for On-device Training of Visi...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Sarkar, Sreetama Kundu, Souvik Zheng, Kai Beerel, Peter A. Univ Southern Calif Los Angeles CA 90007 USA Intel Labs San Diego CA USA

ISBN: (纸本)9798350365474

The ubiquity of vision transformers (ViTs) for various edge applications, including personalized learning, has created the demand for on-device fine-tuning. However, training with the limited memory and computation power of edge devices remains a significant challenge. In particular, the memory required for training is much higher than that needed for inference, primarily due to the need to store activations across all layers in order to compute the gradients needed for weight updates. Previous works have explored reducing this memory requirement via frozen-weight training as well storing the activations in a compressed format. However, these methods are deemed inefficient due to their inability to provide training or inference speedup. In this paper, we first investigate the limitations of existing on-device training methods aimed at reducing memory and compute requirements. We then present block selective reprogramming (BSR) in which we fine-tune only a fraction of total blocks of a pre-trained model and selectively drop tokens based on self-attention scores of the frozen layers. To show the efficacy of BSR, we present extensive evaluations on ViT-B and DeiT-S with five different datasets. Compared to the existing alternatives, our approach simultaneously reduces training memory by up to 1.4x and compute cost by up to 2x while maintaining similar accuracy. We also showcase results for Mixture-of-Expert (MoE) models, demonstrating the effectiveness of our approach in multitask learning scenarios. Code will be available at: https://***/sreetamasarkar/BSR.

关键词： on-device training token pruning vision transformer

来源：评论

学校读者我要写书评

暂无评论

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG-based Video Analysis System

ViTA: An Efficient Video-to-Text Algorithm using VLM for RAG...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Arefeen, Md Adnan Debnath, Biplob Uddin, Md Yusuf Sarwar Chakradhar, Srimat NEC Labs Amer Princeton NJ 08540 USA Univ Missouri Kansas City MO 64110 USA

ISBN: (纸本)9798350365474

Retrieval-augmented generation (RAG) is used in natural language processing (NLP) to provide query-relevant information in enterprise documents to large language models (LLMs). Such enterprise context enables the LLMs to generate more informed and accurate responses. When enterprise data is primarily videos, AI models like vision language models (VLMs) are necessary to convert information in videos into text. While essential, this conversion is a bottleneck, especially for large corpus of videos. It delays the timely use of enterprise videos to generate useful responses. We propose ViTA, a novel method that leverages two unique characteristics of VLMs to expedite the conversion process. As VLMs output more text tokens, they incur higher latency. In addition, large (heavyweight) VLMs can extract intricate details from images and videos, but they incur much higher latency per output token when compared to smaller (lightweight) VLMs that may miss details. To expedite conversion, ViTA first employs a lightweight VLM to quickly understand the gist or overview of an image or a video clip, and directs a heavyweight VLM (through prompt engineering) to extract additional details by using only a few (preset number of) output tokens. Our experimental results show that ViTA expedites the conversion time by as much as 43%, without compromising the accuracy of responses when compared to a baseline system that only uses a heavyweight VLM.

关键词： Large Language Models (LLMs) Natural Language Processing Retrieval Augmented Generation (RAG) Video Analytics vision Language Models (VLMs)

来源：评论

学校读者我要写书评

暂无评论

Task Navigator: Decomposing Complex Tasks for Multimodal Large Language Models

Task Navigator: Decomposing Complex Tasks for Multimodal Lar...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Ma, Feipeng Zhou, Yizhou Zhang, Yueyi Wu, Siying Zhang, Zheyu He, Zilong Rao, Fengyun Sun, Xiaoyan Univ Sci & Technol China Hefei Peoples R China Tencent Inc WeChat Shenzhen Peoples R China Hefei Comprehens Natl Sci Ctr Inst Artificial Intelligence Hefei Peoples R China

ISBN: (纸本)9798350365474

Inspired by the remarkable progress achieved by recent Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) take LLMs as their brains, and have achieved surprising results in many downstream tasks by training on a large amount of task-specific data. However, when faced with complex tasks that require the collaboration of multiple capabilities, existing MLLMs recollect training data and retrain the model, ignoring the systematic utilization of LLMs and their possessed capabilities learned in downstream tasks. Inspired by the way humans tackle complex questions, in this paper, we propose a novel framework called Task Navigator. In our framework, LLMs act as navigators to chart a viable path for solving complex tasks and guide MLLMs through the process step by step. Specifically, LLMs iteratively break down sub-problems and refine them to be more reasonable and answerable, which are subsequently resolved by MLLMs to obtain relevant subanswers, until the LLMs have collected enough information to answer the initial question. Task Navigator provides an effective way to extend MLLMs to tackle complex tasks, thus broadening MLLMs' applicability. To evaluate the performance of the proposed framework, we have curated a carefully designed benchmark called VersaChallenge. Experiments on VersaChallenge demonstrate the effectiveness of our proposed method.

关键词： Language and vision Multi-modal vision

来源：评论

学校读者我要写书评

暂无评论

Photo-Realistic Image Restoration in the Wild with Controlled vision-Language Models

Photo-Realistic Image Restoration in the Wild with Controlle...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Luo, Ziwei Gustafsson, Fredrik K. Zhao, Zheng Sjolund, Jens Schon, Thomas B. Uppsala Univ Uppsala Sweden Karolinska Inst Stockholm Sweden

ISBN: (纸本)9798350365474

Though diffusion models have been successfully applied to various image restoration (IR) tasks, their performance is sensitive to the choice of training datasets. Typically, diffusion models trained in specific datasets fail to recover images that have out-of-distribution degradations. To address this problem, this work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR). More specifically, all low-quality images are simulated with a synthetic degradation pipeline that contains multiple common degradations such as blur, resize, noise, and JPEG compression. Then we introduce robust training for a degradation-aware CLIP model to extract enriched image content features to assist high-quality image restoration. Our base diffusion model is the image restoration SDE (IR-SDE). Built upon it, we further present a posterior sampling strategy for fast noise-free image generation. We evaluate our model on both synthetic and real-world degradation datasets. Moreover, experiments on the unified image restoration task illustrate that the proposed posterior sampling improves image generation quality for various degradations.

关键词： Diffusion model Image restoration real-world Super-resolution

来源：评论

学校读者我要写书评

暂无评论

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images

Sat2Cap: Mapping Fine-Grained Textual Descriptions from Sate...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Dhakal, Aayush Ahmad, Adeel Khanal, Subash Sastry, Srikumar Kerner, Hannah Jacobs, Nathan Washington Univ St Louis MO 63110 USA Taylor Geospatial Inst St Louis MO USA Arizona State Univ Tempe AZ 85287 USA

ISBN: (纸本)9798350365474

We propose a weakly supervised approach for creating maps using free-form textual descriptions. We refer to this work of creating textual maps as zero-shot mapping. Prior works have approached mapping tasks by developing models that predict a fixed set of attributes using overhead imagery. However, these models are very restrictive as they can only solve highly specific tasks for which they were trained. Mapping text, on the other hand, allows us to solve a large variety of mapping problems with minimal restrictions. To achieve this, we train a contrastive learning framework called Sat2Cap on a new large-scale dataset with 6.1M pairs of overhead and ground-level images. For a given location and overhead image, our model predicts the expected CLIP embeddings of the ground-level scenery. The predicted CLIP embeddings are then used to learn about the textual space associated with that location. Sat2Cap is also conditioned on date-time information, allowing it to model temporally varying concepts over a location. Our experimental results demonstrate that our models successfully capture ground-level concepts and allow large-scale mapping of fine-grained textual queries. Our approach does not require any text-labeled data, making the training easily scalable. The code, dataset, and models will be made publicly available.

关键词： Contrastive Learning Text-based Mapping vision-Language Model

来源：评论

学校读者我要写书评

暂无评论

Generalized Single-Image-Based Morphing Attack Detection Using Deep Representations from vision Transformer

Generalized Single-Image-Based Morphing Attack Detection Usi...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Haoyu Ramachandra, Raghavendra Raja, Kiran Busch, Christoph Norwegian Univ Sci & Technol Trondheim Norway Darmstadt Univ Appl Sci Darmstadt Germany

ISBN: (纸本)9798350365474

Face morphing attacks have posed severe threats to Face recognition Systems (FRS), which are operated in border control and passport issuance use cases. Correspondingly, morphing attack detection algorithms (MAD) are needed to defend against such attacks. MAD approaches must be robust enough to handle unknown attacks in an open-set scenario where attacks can originate from various morphing generation algorithms, post-processing and the diversity of printers/scanners. The problem of generalization is further pronounced when the detection has to be made on a single suspected image. In this paper, we propose a generalized single-image-based MAD (S-MAD) algorithm by learning the encoding from vision Transformer (ViT) architecture. Compared to CNN-based architectures, ViT model has the advantage on integrating local and global information and hence can be suitable to detect the morphing traces widely distributed among the face region. Extensive experiments are carried out on face morphing datasets generated using publicly available FRGC face datasets. Several state-of-the-art (SOTA) MAD algorithms, including representative ones that have been publicly evaluated, have been selected and benchmarked with our ViT-based approach. Obtained results demonstrate the improved detection performance of the proposed S-MAD method on inter-dataset testing (when different data is used for training and testing) and comparable performance on intra-dataset testing (when the same data is used for training and testing) experimental protocol.

关键词： Face recognition Morphing Attack Detection

来源：评论

学校读者我要写书评

暂无评论

Divide and Conquer Boosting for Enhanced Traffic Safety Description and Analysis with Large vision Language Model

Divide and Conquer Boosting for Enhanced Traffic Safety Desc...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Khai Trinh Xuan Khoi Nguyen Nguyen Bach Hoang Ngo Vu Dinh Xuan Minh-Hung An Quang-Vinh Dinh Ho Chi Minh City Univ Technol VNU HCM Ho Chi Minh City Vietnam Univ Sci VNU HCM Ho Chi Minh City Vietnam Univ Informat Technol VNU HCM Ho Chi Minh City Vietnam FPT Telecom Hanoi Vietnam AI Lab AI VIETNAM Ho Chi Minh City Vietnam Vietnam Natl Univ Ho Chi Minh City Ho Chi Minh City Vietnam

ISBN: (纸本)9798350365474

The increasing complexity of traffic dynamics has underscored the necessity for advanced traffic safety description and analysis, challenging the efficacy of current methodologies in comprehensively understanding and predicting safety conditions from transportation videos. This paper addresses these challenges by analyzing specific segments crucial for precise traffic safety descriptions. Through this examination, we introduce an innovative preprocessing method named "segment extraction", facilitating the creation of a novel segment-based training dataset. Additionally, we present a practical two-stage training framework specifically tailored for this dataset. This framework is designed to produce accurate descriptions of traffic safety by incorporating the unique attributes of our segment-based training datasets. Leveraging these contributions, our method achieved a notable 2nd rank with a score of 32.8877 in the AI City Challenge Track2 test set: Traffic Safety Description and Analysis 2024. The source code for the proposed approaches is openly accessible at https://***/AIVIETNAMResearch/AI-CIty2024-Track2

关键词： large language model large vision language model video captioning video instance captioning video-text model

来源：评论

学校读者我要写书评

暂无评论

Video Based Computational Coding of Movement Anomalies in ASD Children

Video Based Computational Coding of Movement Anomalies in AS...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Singh, Priya Pathak, Abhishek Ganai, Umer Jon Bhushan, Braj Subramanian, Venkatesh K. TCS Res Mumbai Maharashtra India Indian Inst Technol Kanpur Kanpur India Airbase Labs India Pvt Ltd Bengaluru India

ISBN: (纸本)9798350365474

Autism spectrum disorder (ASD) is a neurodevelopmental disorder. Early detection and diagnosis are instrumental in early intervention, yet diagnosis often remains delayed due to the limited availability of clinical practitioners and specialists. We propose a computer vision and Machine Learning based novel framework for quantitative screening of autism spectrum disorder (ASD). This is aimed to minimize the need for trained professionals at the initial stage but not substitute for it. We designed simple activities in consultation with ASD clinical psychologists and therapists for children in the 3-7 years age group that could be performed in their natural environment (home). The temporal features extracted from these activities encode the behavioral differences between Autism Spectrum Disorder (ASD) and Typically Developing (TD) control groups. Due to the unavailability of a public dataset of children performing the designed task, we created our own video dataset of 210 videos taken in unconstrained natural settings. The dataset was collected from a single RGB camera. The proposed vision and learning-based algorithms extract features from the collected data for a comprehensive set of indicators including the visual attention span, name-calling response, neck pose of the subjects, gross motor movement and establish a parametrized automated protocol for early detection without the need to take the subjects out of their natural daily environment. This forestalls the possibility of misperformance by the subject out of nervousness due to unfamiliar surroundings. Results show that our ASD screening methodology can achieve superior performance compared to the single phenotype approaches, and thus has a prognostic value that could be helpful for both clinical and research applications.

关键词： Autism Spectrum Disorder computer vision Early detection of ASD Machine Learning

来源：评论

学校读者我要写书评

暂无评论

DiCo-NeRF: Difference of Cosine Similarity for Neural Rendering of Fisheye Driving Scenes

DiCo-NeRF: Difference of Cosine Similarity for Neural Render...

引用

ieee/cvf conference on computer vision and pattern recognition (CVPR)

作者： Choi, Jiho Hwang, Gyutae Lee, Sang Jun Jeonbuk Natl Univ Jeonju South Korea

ISBN: (纸本)9798350365474

Neural radiance fields have emerged in the field of autonomous driving, which contributes to improve perception of the complex 3D environment through the reconstruction of geometry and appearance. Moving objects and sky for outdoor environment is challenging to optimize the NeRF model. Previous work addresses these challenges through preprocessing such as masking;however, the masking process requires additional ground-truth data and a segmentation network. We propose DiCo-NeRF, an approach for driving scenes by leveraging cosine similarity map differences of vision-language aligned model. DiCo-NeRF investigates the correlation between rendered patches and pre-defined text and adjusts the loss of challenging patches, such as moving objects and the sky. Our neural radiance field utilizes embedding vectors from a pre-trained CLIP to obtain the cosine similarity maps. We introduce SimLoss, a loss function aimed at regulating the color field of NeRF based on the quantified distribution differences between ground-truth and rendered similarity maps. Unlike previous NeRF models that used driving datasets, our approach does not require additional input, such as sensor data, to the model. Experimental results demonstrate that the incorporation of language semantic cues improves the performance of the novel view synthesis task, particularly in complex driving environments. We conducted experiments that included fisheye driving scenes from the KITTI360 and real-world datasets. Our code is available at https://***/ziiho08/DiCoNeRF.

关键词： Autonomous driving Fisheye camera Neural Radiance Fields vision-Language models

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 5 6 7 8 9 10 11 12 13 14 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：