检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

4,477 篇 会议
9 篇 期刊文献
2 册 图书

馆藏范围

4,488 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,329 篇 工学
- 1,912 篇 计算机科学与技术...
- 541 篇 软件工程
- 417 篇 机械工程
- 327 篇 光学工程
- 269 篇 控制科学与工程
- 216 篇 仪器科学与技术
- 117 篇 信息与通信工程
- 99 篇 电气工程
- 79 篇 生物工程
- 50 篇 生物医学工程（可授...
- 34 篇 电子科学与技术（可...
- 25 篇 安全科学与工程
- 21 篇 化学工程与技术
- 16 篇 建筑学
- 15 篇 交通运输工程
- 14 篇 土木工程
489 篇 理学
- 327 篇 物理学
- 194 篇 数学
- 83 篇 生物学
- 79 篇 统计学（可授理学、...
- 23 篇 系统科学
- 18 篇 化学
206 篇 艺术学
- 206 篇 设计学（可授艺术学...
67 篇 管理学
- 48 篇 图书情报与档案管...
- 19 篇 管理科学与工程(可...
- 10 篇 工商管理
45 篇 医学
- 45 篇 临床医学
- 13 篇 基础医学(可授医学...
- 11 篇 药学(可授医学、理...
20 篇 法学
- 18 篇 社会学
7 篇 农学
4 篇 教育学
1 篇 经济学
1 篇 文学
1 篇 军事学

主题

1,834 篇 computer vision
890 篇 conferences
693 篇 pattern recognit...
656 篇 training
472 篇 cameras
381 篇 feature extracti...
375 篇 computational mo...
341 篇 visualization
314 篇 computer archite...
285 篇 image segmentati...
259 篇 face recognition
231 篇 object detection
230 篇 robustness
208 篇 shape
193 篇 three-dimensiona...
184 篇 humans
176 篇 neural networks
169 篇 semantics
166 篇 computer science
157 篇 benchmark testin...

机构

21 篇 swiss fed inst t...
19 篇 swiss fed inst t...
18 篇 university of sc...
17 篇 univ sci & techn...
17 篇 carnegie mellon ...
15 篇 institute for co...
14 篇 tsinghua univers...
13 篇 computer vision ...
13 篇 tsinghua univ pe...
13 篇 stanford univ st...
12 篇 harbin inst tech...
12 篇 mit cambridge ma...
12 篇 sun yat sen univ...
12 篇 carnegie mellon ...
11 篇 chinese univ hon...
11 篇 megvii technol p...
11 篇 chinese acad sci...
10 篇 comp vis ctr bar...
10 篇 univ modena & re...
10 篇 beihang univ peo...

作者

57 篇 timofte radu
20 篇 luc van gool
20 篇 radu timofte
17 篇 horst bischof
16 篇 van gool luc
15 篇 sergio escalera
12 篇 zhigang zhu
12 篇 li stan z.
12 篇 chen wei-ting
12 篇 bischof horst
12 篇 lei lei
11 篇 fan haoqiang
11 篇 sun jian
11 篇 marcos v. conde
11 篇 lei zhen
10 篇 escalera sergio
10 篇 cucchiara rita
10 篇 zhang lei
10 篇 angel d. sappa
10 篇 liu shuaicheng

语言

4,483 篇 英文
4 篇 中文
1 篇 其他

检索条件"任意字段=2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2013"

共 4488 条记录，以下是41-50 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Rugby Scene Classification Enhanced by vision Language Model

Rugby Scene Classification Enhanced by Vision Language Model

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Nonaka, Naoki Fujihira, Ryo Koshiba, Toshiki Maeda, Akira Seita, Jun RIKEN Informat R&D & Strategy Headquarters Adv Data Sci Project Wako Saitama Japan Hakata Knee & Sports Clin Fukuoka Japan

ISBN: (纸本)9798350365474

This study investigates the integration of vision language models (VLM) to enhance the classification of situations within rugby match broadcasts. The importance of accurately identifying situations in sports videos is emphasized for understanding game dynamics and facilitating downstream tasks like performance evaluation and injury prevention. Utilizing a dataset comprising 18, 000 labeled images extracted at 0.2-second intervals from 100 minutes of rugby match broadcasts, scene classification tasks including contact plays (scrums, mauls, rucks, tackles, lineouts), rucks, tackles, lineouts, and multiclass classification were performed. The study aims to validate the utility of VLM outputs in improving classification performance compared to using solely image data. Experimental results demonstrate substantial performance improvements across all tasks with the incorporation of VLM outputs. Our analysis of prompts suggests that, when provided with appropriate contextual information through natural language, VLMs can effectively capture the context of a given image. The findings of our study indicate that leveraging VLMs in the domain of sports analysis holds promise for developing image processing models capable of incorpolating the tacit knowledge encoded within language models, as well as information conveyed through natural language descriptions.

关键词： Rugby Scene classification vision language model

来源：评论

学校读者我要写书评

暂无评论

Hairy Ground Truth Enhancement for Semantic Segmentation

Hairy Ground Truth Enhancement for Semantic Segmentation

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Fischer, Sophie Voiculescu, Irina Univ Oxford Dept Comp Sci Oxford England

ISBN: (纸本)9798350365474

Semantic segmentation is a key task within applications of machine learning for medical imaging, requiring large amounts of medical scans annotated by clinicians. The high cost of data annotation means that models need to make the most of all available ground truth masks;yet many models consider two false positive (or false negative) pixel predictions as 'equally wrong' regardless of the individual pixels' relative position to the ground truth mask. These methods also have no sense of whether a pixel is solitary or belongs to a contiguous group. We propose the Hairy transform, a novel method to enhance ground truths using 3D 'hairs' to represent each pixel's position relative to objects in the ground truth. We illustrate its effectiveness using a mainstream model and loss function on a commonly used cardiac MRI dataset, as well as a set of synthetic data constructed to highlight the effect of the method during training. The overall improvement in segmentation results comes at the small cost of a one-off pre-processing step, and can easily be integrated into any standard machine learning model. Rather than looking to make minute improvements for mostly correct 'standard' masks we instead show how this method helps improve robustness against catastrophic failures for edge cases.

关键词： computer vision Ground Truth Enhancement Machine Learning Medical Imaging

来源：评论

学校读者我要写书评

暂无评论

Strategies to Improve Real-World Applicability of Laparoscopic Anatomy Segmentation Models

Strategies to Improve Real-World Applicability of Laparoscop...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Kolbinger, Fiona R. He, Jiangpeng Ma, Jinge Zhu, Fengqing Purdue Univ W Lafayette IN 47907 USA

ISBN: (纸本)9798350365474

Accurate identification and localization of anatomical structures of varying size and appearance in laparoscopic imaging are necessary to leverage the potential of computer vision techniques for surgical decision support. Segmentation performance of such models is traditionally reported using metrics of overlap such as IoU. However, imbalanced and unrealistic representation of classes in the training data and suboptimal selection of reported metrics have the potential to skew nominal segmentation performance and thereby ultimately limit clinical translation. In this work, we systematically analyze the impact of class characteristics (i.e., organ size differences), training and test data composition (i.e., representation of positive and negative examples), and modeling parameters (i.e., foreground-to-background class weight) on eight segmentation metrics: accuracy, precision, recall, IoU, F1 score (Dice Similarity Coefficient), specificity, Hausdorff Distance, and Average Symmetric Surface Distance. Our findings support two adjustments to account for data biases in surgical data science: First, training on datasets that are similar to the clinical real-world scenarios in terms of class distribution, and second, class weight adjustments to optimize segmentation model performance with regard to metrics of particular relevance in the respective clinical setting.

关键词： Class Imbalance computer-Assisted Surgery Laparoscopic Surgery Semantic Segmentation Surgical Data Science

来源：评论

学校读者我要写书评

暂无评论

ALINA: Advanced Line Identification and Notation Algorithm

ALINA: Advanced Line Identification and Notation Algorithm

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Khan, Mohammed Abdul Hafeez Ganeriwala, Parth Bhattacharyya, Siddhartha Neogi, Natasha Muthalagu, Raja Florida Inst Technol Melbourne FL 32901 USA NASA Langley Res Ctr Hampton VA 23665 USA BITS Pilani Dubai Campus Dubai U Arab Emirates

ISBN: (纸本)9798350365474

Labels are the cornerstone of supervised machine learning algorithms. Most visual recognition methods are fully supervised, using bounding boxes or pixel-wise segmentations for object localization. Traditional labeling methods, such as crowd-sourcing, are prohibitive due to cost, data privacy, amount of time, and potential errors on large datasets. To address these issues, we propose a novel annotation framework, Advanced Line Identification and Notation Algorithm (ALINA), which can be used for labeling taxiway datasets that consist of different camera perspectives and variable weather attributes (sunny and cloudy). Additionally, the CIRCular threshoLd pixEl Discovery And Traversal (CIRCLEDAT) algorithm has been proposed, which is an integral step in determining the pixels corresponding to taxiway line markings. Once the pixels are identified, ALINA generates corresponding pixel coordinate annotations on the frame. Using this approach, 60,249 frames from the taxiway dataset, AssistTaxi have been labeled. To evaluate the performance, a context-based edge map (CBEM) set was generated manually based on edge features and connectivity. The detection rate after testing the annotated labels with the CBEM set was recorded as 98.45%, attesting its dependability and effectiveness.

关键词： aircraft perception annotation autonomous driving computer vision labeling line identification taxiway data

来源：评论

学校读者我要写书评

暂无评论

De-noised vision-language Fusion Guided by Visual Cues for E-commerce Product Search

De-noised Vision-language Fusion Guided by Visual Cues for E...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hu, Zhizhang Li, Shasha Du, Ming Dhua, Arnab Gray, Douglas Univ Calif Merced Merced CA 95343 USA Amazon Visual Search & AR Seattle WA USA Amazon Seattle WA USA

ISBN: (纸本)9798350365474

In e-commerce applications, vision-language multimodal transformer models play a pivotal role in product search. The key to successfully training a multimodal model lies in the alignment quality of image-text pairs in the dataset. However, the data in practice is often automatically collected with minimal manual intervention. Hence the alignment of image-text pairs is far from ideal. In e-commerce, this misalignment can stem from noisy and redundant non-visual-descriptive text attributes in the product description. To address this, we introduce the MultiModal alignment-guided Learned Token Pruning (MM-LTP) method. MM-LTP employs token pruning, conventionally used for computational efficiency, to perform online text cleaning during multimodal model training. By enabling the model to discern and discard unimportant tokens, it is able to train with implicitly cleaned image-text pairs. We evaluate MM-LTP using a benchmark multimodal e-commerce dataset comprising over 710,000 unique Amazon products. Our evaluation hinges on visual search, a prevalent e-commerce feature. Through MM-LTP, we demonstrate that refining text tokens enhances the paired image branch's training, which leads to significantly improved visual search performance.

关键词： De-noised Fusion Multimodal Learning Token Pruning vision-language Model Visual Search

来源：评论

学校读者我要写书评

暂无评论

Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis

Enhancing Traffic Safety with Parallel Dense Video Captionin...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Shoman, Maged Wang, Dongdong Aboah, Armstrong Abdel-Aty, Mohamed Univ Cent Florida Dept Civil Environm & Construct Engn Smart & Safe Transportat SST Lab Orlando FL 32816 USA North Dakota State Univ Dept Civil Construct & Environm Engn Fargo ND USA Univ Cent Florida Joint Appointment Dept Comp Sci Dept Civil Environm & Construct Engn Smart & Safe Transportat SST Lab Orlando FL USA

ISBN: (纸本)9798350365474

This paper introduces our solution for Track 2 in AI City Challenge 2024. The task aims to solve traffic safety description and analysis with the dataset of Woven Traffic Safety (WTS), a real-world Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding. Our solution mainly focuses on the following points: 1) To solve dense video captioning, we leverage the framework of dense video captioning with parallel decoding (PDVC) to model visual-language sequences and generate dense caption by chapters for video. 2) Our work leverages CLIP to extract visual features to more efficiently perform cross-modality training between visual and textual representations. 3) We conduct domain-specific model adaptation to mitigate domain shift problem that poses recognition challenge in video understanding. 4) Moreover, we leverage BDD-5K captioned videos to conduct knowledge transfer for better understanding WTS videos and more accurate captioning. Our solution has yielded on the test set, achieving 6th place in the competition. The opensource code will be available at https://***/UCF-SST-Lab/AICity2024cvprw

关键词： computer vision cross-modality learning dense video captioning domain adaptation natural language processing

来源：评论

学校读者我要写书评

暂无评论

ZInD-Tell: Towards Translating Indoor Panoramas into Descriptions

ZInD-Tell: Towards Translating Indoor Panoramas into Descrip...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Deb, Tonmoay Wang, Lichen Bessinger, Zachary Khosravan, Naji Penner, Eric Kang, Sing Bing Northwestern Univ Evanston IL 60208 USA Zillow Grp Seattle WA USA

ISBN: (纸本)9798350365474

This paper focuses on bridging the gap between natural language descriptions, 360 degrees panoramas, room shapes, and layouts/floorplans of indoor spaces. To enable new multimodal (image, geometry, language) research directions in indoor environment understanding, we propose a novel extension to the Zillow Indoor Dataset (ZInD) which we call ZInD-Tell1. We first introduce an effective technique for extracting geometric information from ZInD's raw structural data, which facilitates the generation of accurate ground truth descriptions using GPT-4. A human-in-the-loop approach is then employed to ensure the quality of these descriptions. To demonstrate the vast potential of our dataset, we introduce the ZInD-Tell benchmark, focusing on two exemplary tasks: language-based home retrieval and indoor description generation. Furthermore, we propose an end-to-end, zero-shot baseline model, ZInD-Agent, designed to process an unordered set of panorama images and generate home descriptions. ZInD-Agent outperforms naive methods in both tasks, hence, can be considered as a complement to the naive to show potential use of the data and impact of geometry. We believe this work initiates new trajectories in leveraging computer vision techniques to analyze indoor panorama images descriptively by learning the latent relation between vision, geometry, and language modalities.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

CUE-Net: Violence Detection Video Analytics with Spatial Cropping, Enhanced UniformerV2 and Modified Efficient Additive Attention

CUE-Net: Violence Detection Video Analytics with Spatial Cro...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Senadeera, Damith Chamalke Yang, Xiaoyun Kollias, Dimitrios Slabaugh, Gregory Queen Mary Univ London Sch Elect Engn & Comp Sci London England Queen Marys Digital Environm Res Inst DERI London England Remark AI UK Ltd London England

ISBN: (纸本)9798350365474

In this paper we introduce CUE-Net, a novel architecture designed for automated violence detection in video surveillance. As surveillance systems become more prevalent due to technological advances and decreasing costs, the challenge of efficiently monitoring vast amounts of video data has intensified. CUE-Net addresses this challenge by combining spatial Cropping with an enhanced version of the UniformerV2 architecture, integrating convolutional and self-attention mechanisms alongside a novel Modified Efficient Additive Attention mechanism (which reduces the quadratic time complexity of self-attention) to effectively and efficiently identify violent activities. This approach aims to overcome traditional challenges such as capturing distant or partially obscured subjects within video frames. By focusing on both local and global spatio-temporal features, CUE-Net achieves state-of-the-art performance on the RWF-2000 and RLVS datasets, surpassing existing methods. The source code is available at (1).

关键词： computer vision Cropping Deep Learning Efficient Additive Attention UniFormerV2 Video Analytics Violence Detection

来源：评论

学校读者我要写书评

暂无评论

Domain Targeted Synthetic Plant Style Transfer using Stable Diffusion, LoRA and ControlNet

Domain Targeted Synthetic Plant Style Transfer using Stable ...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Hartley, Zane K. J. Lind, Rob J. Pound, Michael P. French, Andrew P. Univ Nottingham Wollaton Rd Nottingham NG8 1BB England Syngenta Jealotts Hill Int Res Ctr Warfield England

ISBN: (纸本)9798350365474

Synthetic images can help alleviate much of the cost in the creation of training data for plant phenotyping-focused AI development. Synthetic-to-real style transfer is of particular interest to users of artificial data because of the domain shift problem created by training neural networks on images generated in a digital environment. In this paper we present a pipeline for synthetic plant creation and image-to-image style transfer, with a particular interest in synthetic to real domain adaptation targeting specific real datasets. Utilizing new advances in generative AI, we employ a combination of Stable diffusion, Low Ranked Adapters (LoRA) and ControlNets to produce an advanced system of style transfer. We focus our work on the core task of leaf instance segmentation, exploring both synthetic to real style transfer as well as inter-species style transfer and find that our pipeline makes numerous improvements over CycleGAN for style transfer, and the images we produce are comparable to real images when used as training data.

关键词： Agriculture computer vision ControlNet Deep Learning Diffusion LoRA Plant Phenotyping

来源：评论

学校读者我要写书评

暂无评论

Classifier Guided Cluster Density Reduction for Dataset Selection

Classifier Guided Cluster Density Reduction for Dataset Sele...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Chang, Cheng Long, Keyu Li, Zijian Rai, Himanshu Layer 6 AI Toronto ON Canada

ISBN: (纸本)9798350365474

In this paper, we address the challenge of selecting an optimal dataset from a source pool with annotations to enhance performance on a target dataset derived from a different source. This is important in scenarios where it is hard to afford on-the-fly dataset annotation and is also the theme of the second Visual Data Understanding (VDU) Challenge. Our solution, the Classifier Guided Cluster Density Reduction (CCDR) framework, operates in two stages. Initially, we employ a filtering technique to identify images that align with the target dataset's distribution. Subsequently, we implement a graph-based cluster density reduction method, steered by a classifier that approximates the distance between the target distribution and source distribution. This classifier is trained to distinguish between images that resemble the target dataset and those that do not, facilitating the pruning process shown in Figure 1. Our approach maintains a balance between selecting pertinent images that match the target distribution and eliminating redundant ones that do not contribute to the enhancement of the detection model. We demonstrate the superiority of our method over various baselines in object detection tasks, particularly in optimizing the training set distribution on the region100 dataset. We have released our code here: https://***/ himsR/DataCVChallenge-2024/tree/main

关键词： computer vision Data Search deep learning domain Transfer

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共449页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：