检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

8,901 篇 会议
43 篇 期刊文献
18 册 图书

馆藏范围

8,961 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

4,560 篇 工学
- 4,020 篇 计算机科学与技术...
- 2,178 篇 软件工程
- 1,241 篇 光学工程
- 555 篇 控制科学与工程
- 431 篇 信息与通信工程
- 430 篇 机械工程
- 294 篇 电气工程
- 287 篇 仪器科学与技术
- 179 篇 生物工程
- 159 篇 生物医学工程（可授...
- 119 篇 电子科学与技术（可...
- 61 篇 安全科学与工程
- 58 篇 建筑学
- 58 篇 化学工程与技术
- 52 篇 土木工程
- 49 篇 交通运输工程
- 40 篇 力学（可授工学、理...
2,065 篇 理学
- 1,382 篇 物理学
- 1,198 篇 数学
- 420 篇 统计学（可授理学、...
- 238 篇 生物学
- 54 篇 化学
- 36 篇 系统科学
263 篇 管理学
- 180 篇 图书情报与档案管...
- 89 篇 管理科学与工程(可...
- 47 篇 工商管理
223 篇 医学
- 222 篇 临床医学
- 39 篇 基础医学(可授医学...
205 篇 艺术学
- 205 篇 设计学（可授艺术学...
45 篇 法学
- 43 篇 社会学
21 篇 农学
14 篇 教育学
9 篇 经济学
6 篇 军事学

主题

3,412 篇 computer vision
1,216 篇 pattern recognit...
946 篇 cameras
908 篇 conferences
765 篇 computer science
674 篇 image segmentati...
618 篇 layout
598 篇 training
548 篇 shape
518 篇 robustness
451 篇 feature extracti...
448 篇 humans
445 篇 face recognition
405 篇 computational mo...
402 篇 object detection
365 篇 visualization
356 篇 computer archite...
336 篇 application soft...
304 篇 lighting
259 篇 image reconstruc...

机构

41 篇 microsoft resear...
30 篇 department of co...
25 篇 department of co...
23 篇 institute for co...
22 篇 department of co...
22 篇 school of comput...
20 篇 university of sc...
20 篇 swiss fed inst t...
19 篇 tsinghua univers...
19 篇 institute of com...
18 篇 swiss fed inst t...
17 篇 the robotics ins...
17 篇 carnegie mellon ...
17 篇 computer vision ...
17 篇 department of co...
16 篇 institute of inf...
16 篇 school of comput...
15 篇 school of comput...
15 篇 carnegie mellon ...
14 篇 national laborat...

作者

57 篇 timofte radu
25 篇 huang thomas s.
24 篇 van gool luc
23 篇 s.k. nayar
22 篇 nayar shree k.
22 篇 t. kanade
21 篇 jain anil k.
20 篇 luc van gool
19 篇 t.s. huang
18 篇 xiaoou tang
18 篇 murino vittorio
18 篇 horst bischof
17 篇 a.k. jain
17 篇 t. darrell
16 篇 g. healey
16 篇 bowyer kevin w.
16 篇 bischof horst
15 篇 m.j. black
15 篇 li stan z.
15 篇 m. shah

语言

8,932 篇 英文
21 篇 其他
8 篇 中文
1 篇 土耳其文

检索条件"任意字段=IEEE-Computer-Society Conference on Computer Vision and Pattern Recognition Workshops"

共 8962 条记录，以下是101-110 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Multi-Modal Fusion of Event and RGB for Monocular Depth Estimation Using a Unified Transformer-based Architecture

Multi-Modal Fusion of Event and RGB for Monocular Depth Esti...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Devulapally, Anusha Khan, Md Fahim Faysal Advani, Siddharth Narayanan, Vijaykrishnan Penn State Univ University Pk PA 16802 USA Samsung Elect Amer Ridgefield Pk NJ USA

ISBN: (纸本)9798350365474

In the field of robotics and autonomous navigation, accurate pixel-level depth estimation has gained significant importance. Event cameras or dynamic vision sensors, capture asynchronous changes in brightness at the pixel level, offering benefits such as high temporal resolution, no motion blur, and a wide dynamic range. However, unlike traditional cameras that measure absolute intensity, event cameras lack the ability to provide scene context. Efficiently combining the advantages of both asynchronous events and synchronous RGB images to enhance depth estimation remains a challenge. In our study, we introduce a unified transformer that combines both event and RGB modalities to achieve precise depth prediction. In contrast to individual transformers for input modalities, a unified transformer model captures inter-modal dependencies and uses self-attention to enhance event-RGB contextual interactions. This approach exceeds the performance of recurrent neural network (RNN) methods used in state-of-the-art models. To encode the temporal information from events, convLSTMs are used before the transformer to improve depth estimation. Our proposed architecture outperforms the existing approaches in terms of absolute mean depth error, achieving state-of-the-art results in most cases. Additionally, the performance is also seen in other metrics like RMSE, absolute relative difference and depth thresholds compared to the existing approaches. The source code is available at:https://***/anusha-devulapally/ER-F2D.

关键词： Event Cameras Monocular Depth Estimation Multi-Modal Fusion vision Transformer

来源：评论

学校读者我要写书评

暂无评论

SegFormer3D: an Efficient Transformer for 3D Medical Image Segmentation

SegFormer3D: an Efficient Transformer for 3D Medical Image S...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Perera, Shehan Navard, Pouyan Yilmaz, Alper Ohio State Univ Photogrammetr Comp Vis Lab Columbus OH 43210 USA

ISBN: (纸本)9798350365474

The adoption of vision Transformers (ViTs) based architectures represents a significant advancement in 3D Medical Image (MI) segmentation, surpassing traditional Convolutional Neural Network (CNN) models by enhancing global contextual understanding. While this paradigm shift has significantly enhanced 3D segmentation performance, state-of-the-art architectures require extremely large and complex architectures with large scale computing resources for training and deployment. Furthermore, in the context of limited datasets, often encountered in medical imaging, larger models can present hurdles in both model generalization and convergence. In response to these challenges and to demonstrate that lightweight models are a valuable area of research in 3D medical imaging, we present SegFormer3D, a hierarchical Transformer that calculates attention across multiscale volumetric features. Additionally, SegFormer3D avoids complex decoders and uses an all-MLP decoder to aggregate local and global attention features to produce highly accurate segmentation masks. The proposed memory efficient Transformer preserves the performance characteristics of a significantly larger model in a compact design. SegFormer3D democratizes deep learning for 3D medical image segmentation by offering a model with 33x less parameters and a 13x reduction in GFLOPS compared to the current state-of-the-art (SOTA). We benchmark SegFormer3D against the current SOTA models on three widely used datasets Synapse, BRaTs, and ACDC, achieving competitive results. Code: https://***/OSUPCVLab/***

关键词： 3D Medical Image Segmentation ACDC Attention BraTs Deep Learning Efficient Attention Segmentation Synapse Transformers vision Transformers

来源：评论

学校读者我要写书评

暂无评论

Co-designing a Sub-millisecond Latency Event-based Eye Tracking System with Submanifold Sparse CNN

Co-designing a Sub-millisecond Latency Event-based Eye Track...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Zhang, Baoheng Gao, Yizhao Li, Jingyuan So, Hayden Kwok-Hay Univ Hong Kong Hong Kong Peoples R China

ISBN: (纸本)9798350365474

Eye-tracking technology is integral to numerous consumer electronics applications, particularly in the realm of virtual and augmented reality (VR/AR). These applications demand solutions that excel in three crucial aspects: low-latency, low-power consumption, and precision. Yet, achieving optimal performance across all these fronts presents a formidable challenge, necessitating a balance between sophisticated algorithms and efficient backend hardware implementations. In this study, we tackle this challenge through a synergistic software/hardware co-design of the system with an event camera. Leveraging the inherent sparsity of event-based input data, we integrate a novel sparse FPGA dataflow accelerator customized for submanifold sparse convolution neural networks (SCNN). The SCNN implemented on the accelerator can efficiently extract the embedding feature vector from each representation of event slices by only processing the non-zero activations. Subsequently, these vectors undergo further processing by a gated recurrent unit (GRU) and a fully connected layer on the host CPU to generate the eye centers. Deployment and evaluation of our system reveal outstanding performance metrics. On the Event-based Eye-Tracking-AIS2024 dataset, our system achieves 81% p5 accuracy, 99.5% p10 accuracy, and 3.71 Mean Euclidean Distance with 0.7 ms latency while only consuming 2.29 mJ per inference. Notably, our solution opens up opportunities for future eye-tracking systems. Code is available at https://***/CASRHKU/ESDA/tree/eye_tracking.

关键词： dynamic vision sensor event camera event-based vision eye tracking FPGA hardware-software codesign sparse processing

来源：评论

学校读者我要写书评

暂无评论

Orientation-conditioned Facial Texture Mapping for Video-based Facial Remote Photoplethysmography Estimation

Orientation-conditioned Facial Texture Mapping for Video-bas...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Cantrill, Sam Ahmedt-Aristizabal, David Petersson, Lars Suominen, Hanna Armin, Mohammad Ali Australian Natl Univ Canberra ACT Australia Commonwealth & Sci Ind Res Org Data61 Canberra ACT Australia Univ Turku Turku Finland

ISBN: (纸本)9798350365474

Camera-based remote photoplethysmography (rPPG) enables contactless measurement of important physiological signals such as pulse rate (PR). However, dynamic and unconstrained subject motion introduces significant variability into the facial appearance in video, confounding the ability of video-based methods to accurately extract the rPPG signal. In this study, we leverage the 3D facial surface to construct a novel orientation-conditioned facial texture video representation which improves the motion robustness of existing video-based facial rPPG estimation methods. Our proposed method achieves a significant 18.2% performance improvement in cross-dataset testing on MMPD over our baseline using the PhysNet model trained on PURE, highlighting the efficacy and generalization benefits of our designed video representation. We demonstrate significant performance improvements of up to 29.6% in all tested motion scenarios in cross-dataset testing on MMPD, even in the presence of dynamic and unconstrained subject motion. Emphasizing the benefits the benefits of disentangling motion through modeling the 3D facial surface for motion robust facial rPPG estimation. We validate the efficacy of our design decisions and the impact of different video processing steps through an ablation study. Our findings illustrate the potential strengths of exploiting the 3D facial surface as a general strategy for addressing dynamic and unconstrained subject motion in videos. The code is available at https://***/orientation-uv-rppg/.

关键词： computer vision motion rppg

来源：评论

学校读者我要写书评

暂无评论

HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing

HarvestNet: A Dataset for Detecting Smallholder Farming Acti...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Xu, Jonathan Elmustafa, Amna Weldegebriel, Liya Negash, Emnet Lee, Richard Meng, Chenlin Ermon, Stefano Lobell, David Stanford Univ Stanford CA 94305 USA Univ Waterloo Waterloo ON Canada Univ Ghent Ghent Belgium Mekelle Univ Mekele Ethiopia

ISBN: (纸本)9798350365474

Small farms contribute to a large share of the productive land in developing countries. In regions such as subSaharan Africa, where 80% of farms are small (under 2 ha in size), the task of mapping smallholder cropland is an important part of tracking sustainability measures such as crop productivity. However, the visually diverse and nuanced appearance of small farms has limited the effectiveness of traditional approaches to cropland mapping. Here we introduce a new approach based on the detection of harvest piles characteristic of many smallholder systems throughout the world. We present HarvestNet, a dataset for mapping the presence of farms in the Ethiopian regions of Tigray and Amhara during 2020-2023, collected using expert knowledge and satellite images, totaling 7k hand-labeled images and 2k ground-collected labels. We also benchmark a set of baselines, including SOTA models in remote sensing, with our best models having around 80% classification performance on hand labelled data and 90% and 98% accuracy on ground truth data for Tigray and Amhara, respectively. We also perform a visual comparison with a widely used pre-existing coverage map and show that our model detects an extra 56,621 hectares of cropland in Tigray. We conclude that remote sensing of harvest piles can contribute to more timely and accurate cropland assessments in food insecure regions. The dataset can be accessed through https://***/s/45a7b45556b90a9a11d2, while the code for the dataset and benchmarks is publicly available at https://***/jonxuxu/harvestpiles.

关键词： agriculture computer vision dataset harvest piles machine learning Remote sensing sustainability

来源：评论

学校读者我要写书评

暂无评论

DSTCFuse: A Method based on Dual-cycled Cross-awareness of Structure Tensor for Semantic Segmentation via Infrared and Visible Image Fusion

DSTCFuse: A Method based on Dual-cycled Cross-awareness of S...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Li, Xuan Chen, Rongfu Wang, Jie Ma, Lei Cheng, Li Yuan, Haiwen Wuhan Inst Technol Sch Elect & Informat Engn Wuhan Peoples R China Hubei Key Lab Opt Informat & Pattern Recognit Wuhan Peoples R China

ISBN: (纸本)9798350365474

Multi-modality information fusion can compensate deficiencies of single modality and provide rich scene information for 2D semantic segmentation. However, the inconsistency in the feature space between different modalities may lead to poor presentation of objects and that would affect subsequent segmented effectiveness. The idea of modal transition can reduce the modal differences and avoid biased processing during the fusion process, but it is hard to perfectly retain the contents of the source images. To address these challenges, a fusion method based on dual-cycled cross-awareness of structure tensor is proposed. Firstly, we propose a dual-cycle modality transition network based on cross-awareness consistency to learn the differences in feature space from different modalities. Secondly, a set of globally structure-tensor preserving modules are designed to enhance the capabilities of the network in capturing complementary features and perceiving global modal consistency. Under the joint constraint of globally structure-tensor awareness loss and cross-awareness loss, our network achieves a robust mapping of feature space from visible to pseudo-infrared images without relying on Ground-Truth. Finally, the pseudo-infrared images that inherit the superior qualities of two modalities are fused with the original infrared images directly, which effectively reduces the complexity of fusion. Extensive comparative experiments show that our method outperforms state-of-the-art methods in qualitative and quantitative evaluation.

关键词： Cross-modality transition Global structure-tensor awareness Image fusion

来源：评论

学校读者我要写书评

暂无评论

Internal Diverse Image Completion

Internal Diverse Image Completion

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Alkobi, Noa Shaham, Tamar Rott Michaeli, Tomer Technion Haifa Israel MIT Technion Haifa Israel

ISBN: (纸本)9798350302493

Image completion is widely used in photo restoration and editing applications, e.g. for object removal. Recently, there has been a surge of research on generating diverse completions for missing regions. However, existing methods require large training sets from a specific domain of interest, and often fail on general-content images. In this paper, we propose a diverse completion method that does not require a training set and can thus treat arbitrary images from any domain. Our internal diverse completion (IDC) approach draws inspiration from recent single-image generative models that are trained on multiple scales of a single image, adapting them to the extreme setting in which only a small portion of the image is available for training. We illustrate the strength of IDC on several datasets, using both user studies and quantitative comparisons.

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

ViTKD: Feature-based Knowledge Distillation for vision Transformers

ViTKD: Feature-based Knowledge Distillation for Vision Trans...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Yang, Zhendong Li, Zhe Zeng, Ailing Li, Zexian Yu, Chun Yu, Liming Tsinghua Shenzhen Int Grad Sch Shenzhen Peoples R China Int Digital Econ Acad IDEA Shenzhen Peoples R China Chinese Acad Sci Inst Automat Beijing Peoples R China Beihang Univ Beijing Peoples R China IDEA Shenzhen Peoples R China

ISBN: (纸本)9798350365474

Knowledge Distillation (KD) has been extensively studied as a means to enhance the performance of smaller models in Convolutional Neural Networks (CNNs). Recently, the vision Transformer (ViT) has demonstrated remarkable success in various computer vision tasks, leading to an increased demand for KD in ViT. However, while logit-based KD has been applied to ViT, other feature-based KD methods for CNNs cannot be directly implemented due to the significant structure gap. In this paper, we conduct an analysis of the properties of different feature layers in ViT to identify a method for feature-based ViT distillation. Our findings reveal that both shallow and deep layers in ViT are equally important for distillation and require distinct distillation strategies. Based on these guidelines, we propose our feature-based method ViTKD, which mimics the shallow layers and generates the deep layer in the teacher. ViTKD leads to consistent and significant improvements in the students. On ImageNet-1K, we achieve performance boosts of 1.64% for DeiT-Tiny, 1.40% for DeiT-Small, and 1.70% for DeiT-Base. Downstream tasks also demonstrate the superiority of ViTKD. Additionally, ViTKD and logit-based KD are complementary and can be applied together directly, further enhancing the student's performance. Specifically, DeiT-T, S, and B achieve accuracies of 77.78%, 83.59%, and 85.41%, respectively, using this combined approach. Code is available at https:// github. com/ yzdv/cls_KD.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Multi-Task Multi-Modal Self-Supervised Learning for Facial Expression recognition

Multi-Task Multi-Modal Self-Supervised Learning for Facial E...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Halawa, Marah Blume, Florian Bideau, Pia Maier, Martin Rahman, Rasha Abdel Hellwich, Olaf Tech Univ Berlin Berlin Germany Univ Grenoble Alpes Grenoble INP CNRS INRIALJK Grenoble France Humboldt Univ Berlin Germany Res Cluster Excellence Sci Intelligence Berlin Germany

ISBN: (纸本)9798350365474

Human communication is multi-modal;e.g., face-to-face interaction involves auditory signals (speech) and visual signals (face movements and hand gestures). Hence, it is essential to exploit multiple modalities when designing machine learning-based facial expression recognition systems. In addition, given the ever-growing quantities of video data that capture human facial expressions, such systems should utilize raw unlabeled videos without requiring expensive annotations. Therefore, in this work, we employ a multitask multi-modal self-supervised learning method for facial expression recognition from in-the-wild video data. Our model combines three self-supervised objective functions: First, a multi-modal contrastive loss, that pulls diverse data modalities of the same video together in the representation space. Second, a multi-modal clustering loss that preserves the semantic structure of input data in the representation space. Finally, a multi-modal data reconstruction loss. We conduct a comprehensive study on this multimodal multi-task self-supervised learning method on three facial expression recognition benchmarks. To that end, we examine the performance of learning through different combinations of self-supervised tasks on the facial expression recognition downstream task. Our model ConCluGen outperforms several multi-modal self-supervised and fully supervised baselines on the CMU-MOSEI dataset. Our results generally show that multi-modal self-supervision tasks offer large performance gains for challenging tasks such as facial expression recognition, while also reducing the amount of manual annotations required. We release our pre-trained models as well as source code publicly (1) .

关键词： computer vision facial expression recognition multi-modal representation learning self-supervised

来源：评论

学校读者我要写书评

暂无评论

NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer

NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mou...

引用

ieee/CVF conference on computer vision and pattern recognition (CVPR)

作者： Manne, Sai Kumar Reddy Martin, Brendan Roy, Tyler Neilson, Ryan Peters, Rebecca Chillara, Meghana Lary, Christine W. Motyl, Katherine J. Wan, Michael Northeastern Univ Boston MA 02115 USA MaineHlth Inst Res Scarborough ME USA Univ Maine Orono ME 04469 USA Tufts Univ Sch Med Medford MA 02155 USA

ISBN: (纸本)9798350365474

Osteoclast cell image analysis plays a key role in osteoporosis research, but it typically involves extensive manual image processing and hand annotations by a trained expert. In the last few years, a handful of machine learning approaches for osteoclast image analysis have been developed, but none have addressed the full instance segmentation task required to produce the same output as that of the human expert led process. Furthermore, none of the prior, fully automated algorithms have publicly available code, pretrained models, or annotated datasets, inhibiting reproduction and extension of their work. We present a new dataset with similar to 2 x 10(5) expert annotated mouse osteoclast masks, together with a deep learning instance segmentation method which works for both in vitro mouse osteoclast cells on plastic tissue culture plates and human osteoclast cells on bone chips. To our knowledge, this is the first work to automate the full osteoclast instance segmentation task. Our method achieves a performance of 0.82 mAP(0.5) (mean average precision at intersection-over-union threshold of 0.5) in cross validation for mouse osteoclasts. We present a novel nuclei-aware osteoclast instance segmentation training strategy (NOISe) based on the unique biology of osteoclasts, to improve the model's generalizability and boost the mAP(0.5) from 0.60 to 0.82 on human osteoclasts. We publish our annotated mouse osteoclast image dataset, instance segmentation models, and code at ***/michaelwwan/noise to enable reproducibility and to provide a public tool to accelerate osteoporosis research(1).

关键词： computer vision instance segmentation medical imaging microscope image analysis osteoporosis

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 7 8 9 10 11 12 13 14 15 16 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：