检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

17,638 篇 会议
255 册 图书
189 篇 期刊文献
1 篇 学位论文

馆藏范围

18,082 篇 电子文献
2 种 纸本馆藏

日期分布

学科分类号

10,443 篇 工学
- 6,148 篇 计算机科学与技术...
- 3,929 篇 电气工程
- 3,741 篇 控制科学与工程
- 2,823 篇 软件工程
- 1,836 篇 信息与通信工程
- 1,551 篇 光学工程
- 1,405 篇 机械工程
- 997 篇 仪器科学与技术
- 549 篇 生物医学工程（可授...
- 498 篇 电子科学与技术（可...
- 433 篇 生物工程
- 232 篇 材料科学与工程（可...
- 195 篇 交通运输工程
- 163 篇 安全科学与工程
- 153 篇 化学工程与技术
- 137 篇 力学（可授工学、理...
- 114 篇 建筑学
- 109 篇 土木工程
3,398 篇 理学
- 2,546 篇 物理学
- 805 篇 数学
- 486 篇 生物学
- 295 篇 系统科学
- 209 篇 统计学（可授理学、...
- 134 篇 化学
1,654 篇 医学
- 1,577 篇 临床医学
- 185 篇 基础医学(可授医学...
759 篇 管理学
- 580 篇 管理科学与工程(可...
- 190 篇 图书情报与档案管...
- 120 篇 工商管理
107 篇 农学
- 104 篇 作物学
78 篇 法学
43 篇 经济学
42 篇 教育学
39 篇 艺术学
37 篇 军事学
18 篇 文学

主题

2,731 篇 computer vision
1,685 篇 cameras
1,485 篇 signal processin...
1,441 篇 robot vision sys...
1,352 篇 image processing
1,169 篇 robot sensing sy...
907 篇 signal processin...
875 篇 mobile robots
835 篇 feature extracti...
767 篇 machine vision
549 篇 image segmentati...
504 篇 object detection
439 篇 visualization
423 篇 deep learning
408 篇 robustness
391 篇 estimation
367 篇 stereo vision
356 篇 navigation
343 篇 training
318 篇 robot kinematics

机构

83 篇 centre for visio...
63 篇 xi an jiao tong ...
54 篇 centre for visio...
37 篇 school of electr...
37 篇 centre for visio...
29 篇 carnegie mellon ...
28 篇 chinese acad sci...
27 篇 shanghai jiao to...
27 篇 center for machi...
27 篇 university of ch...
23 篇 centre for visio...
23 篇 harbin inst tech...
21 篇 univ chinese aca...
21 篇 nanyang technol ...
17 篇 centre for visio...
16 篇 university of sc...
16 篇 tsinghua univers...
13 篇 chinese acad sci...
13 篇 univ sci & techn...
13 篇 chinese univ hon...

作者

52 篇 j. kittler
40 篇 josef kittler
28 篇 nakadai kazuhiro
19 篇 anil fernando
18 篇 wang wei
15 篇 chen chen
14 篇 yang yang
14 篇 nascimento jacin...
13 篇 jing zhang
13 篇 liu yang
13 篇 sun fuchun
12 篇 sun lining
12 篇 hansung kim
11 篇 zhang lei
11 篇 bartolozzi chiar...
11 篇 hong liu
10 篇 wang lei
10 篇 li yang
10 篇 aguiar pedro m. ...
10 篇 qiuqiang kong

语言

17,904 篇 英文
87 篇 中文
78 篇 其他
12 篇 土耳其文
3 篇 俄文
2 篇 西班牙文

检索条件"任意字段=International Conference on Robot Vision and Signal Processing"

共 18083 条记录，以下是51-60 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

18th international Workshop on Design and Architecture for signal and Image processing, DASIP 2025

18th International Workshop on Design and Architecture for S...

引用

18th international Workshop on Design and Architecture for signal and Image processing, DASIP 2025

ISBN: (纸本)9783031878961

The proceedings contain 10 papers. The special focus in this conference is on Design and Architectures for signal and Image processing. The topics include: LiFT: Lightweight, FPGA-Tailored 3D Object Detection Based on LiDAR Data;A Practical HW-Aware NAS Flow for AI vision Applications on Embedded Heterogeneous SoCs;Endoscopy Image Classification for Wireless Capsules with CNNs on Microcontroller-Based Platforms;joint Underwater Depth Estimation and Dehazing from a Single Image Using Attention U-Net;KD-AHOSVD: Neural Network Compression via Knowledge Distillation and Tensor Decomposition;Novel Scheduling and Shifter Networks for 5G LDPC Decoders;Comparison Between In-Core Hardware IDS, Off-Core Hardware IDS and Software IDS;comparative Study of Memory Optimization Techniques for Dataflow-Modeled Applications.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Rethinking Mamba in Speech processing by Self-Supervised Models

Rethinking Mamba in Speech Processing by Self-Supervised Mod...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Zhang, Xiangyu Ma, Jianbo Shahin, Mostafa Ahmed, Beena Epps, Julien The University of New South Wales Australia Dolby Laboratories United States

ISBN: (纸本)9798350368741

The Mamba-based model has demonstrated outstanding performance across tasks in computer vision, natural language processing, and speech processing. However, in the realm of speech processing, the Mamba-based model's performance varies across different tasks. For instance, in tasks such as speech enhancement and spectrum reconstruction, the Mamba model performs well when used independently. However, for tasks like speech recognition, additional modules are required to surpass the performance of attention-based models. We propose the hypothesis that the Mamba-based model excels in"reconstruction" tasks within speech processing. However, for"classification tasks" such as Speech Recognition, additional modules are necessary to accomplish the"reconstruction" step. To validate our hypothesis, we analyze the previous Mamba-based Speech Models from an information theory perspective. Furthermore, we leveraged the properties of HuBERT in our study. We trained a Mamba-based HuBERT model, and the mutual information patterns, along with the model's performance metrics, confirmed our assumptions. © 2025 IEEE.

关键词： Mamba Speech processing

来源：评论

学校读者我要写书评

暂无评论

Spatiotemporal-Aware Visual Captioning using vision-Language Pre-Training Model

Spatiotemporal-Aware Visual Captioning using Vision-Language...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Wu, Shuai Yang, Weidong Wu, Shuyan School of Computer Science Fudan University Shanghai China Faculty of Electronic and Information Engineering Xi'an Jiaotong University Xi'an China

ISBN: (纸本)9798350368741

Current visual captioning technologies typically transform 3D/2D visual information into one-dimensional sequential data and employ language models to generate corresponding descriptions. This approach, however, compromises the spatiotemporal information in visual data, making it difficult for models to capture temporal variations and the relative spatial relationships between objects. To address this issue, we propose STPos-VC, a pre-trained vision-language model that maps visual information from the visual vector space to the textual vector space through a visual-text mapper and generates natural language descriptions using a decoder. The mapper incorporates three-dimensional rotational position encoding, which effectively preserves the relative spatiotemporal positional relationships. Furthermore, we pre-train the model on a mixed dataset comprising images and videos through a visual question-answering framework, enabling the model to perform well even with small sample sizes. Experimental results across multiple datasets demonstrate that, compared to existing methods, STPos-VC achieves superior performance in both general-purpose and domain-specific applications. © 2025 IEEE.

关键词： Multimodality Pre-training Spatiotemporal position encoding Visual Language Model

来源：评论

学校读者我要写书评

暂无评论

MixSense: Mixture of vision Sense

MixSense: Mixture of Vision Sense

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Lin, Jian Wang, Zhuoran Qiu, Qibo Chen, Jianzhong Ge, Zixian Jin, Weizhong Yan, Yuchao Yu, Li Hangzhou China School of Computing and Information University of Pittsburgh Pittsburgh United States

ISBN: (纸本)9798350368741

It is a new trend to fine-tune Large Multimodal Models (LMMs) to adapt to specific visual tasks through task-related conversation data. This approach provides a new paradigm for solving various vision-language tasks, however, it still faces two problems: (1) the global visual features input to the backbone Large Language Model (LLM) lack the focus on task-specific information, which may lead a sub-optimal performance on specific visual tasks;(2) The hallucinations on specific tasks are underestimated, which makes the model prone to factual errors during inference. In order to solve these two problems, we propose MixSense, a new LMM paradigm that aligns the visual features of specilized models and general models to the backbone LLM together, increasing the focus on specific task information while maintaining the vanilla capabilities of the model;evaluated and fine-tuned with the constructed task-specific negative samples, we can further assess the degree of hallucinations of the model on specific visual tasks and reduce it. We validate our method on a popular Object Detection (OD) task, Referring Expression Comprehension (REC), and extensive experiments demonstrate the effectiveness of our proposal. © 2025 IEEE.

关键词： Large Multimodal Model Negative Samples Specific Tasks

来源：评论

学校读者我要写书评

暂无评论

PeT-KeyStAtion: Parameter-efficient Transformer with Keypoint-guided Spatial-temporal Aggregation for Video-based Person Re-identification

PeT-KeyStAtion: Parameter-efficient Transformer with Keypoin...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Ma, Xingan Yi, Jinhui Gall, Juergen University of Bonn Bonn Germany

ISBN: (纸本)9798350368741

Video-based Person Re-identification (ReID) is crucial in visual surveillance, focusing on matching video snippets of individuals across multiple non-overlapping cameras. Existing methods either conduct ReID at the image level without leveraging temporal information, or employ complex temporal information aggregation techniques, which results in substantial network size and reduced performance efficiency. Recent advances in vision Transformer (ViT) architectures leverage diverse large-scale datasets alongside sophisticated architectures to achieve enhanced fine-grained feature discrimination. To fully explore the potential of ViT architectures without adding substantial additional modules for video-based ReID, we propose PeT-KeyStAtion: a Parameter-efficient Transformer with Keypoint-guided Spatial-temporal Aggregation using a Spatial-Temporal and Keypoint (STK) Module with lightweight adapters. Our framework effectively captures and aggregates spatial, temporal, and keypoint information with only 11% of the parameters compared to full fine-tuning. Extensive experiments show that our method outperforms state-of-the-art baselines on MARS and iLIDS-VID, and achieves promising performance on LS-VID. © 2025 IEEE.

关键词： Parameter-efficient Feature Fusion Video-based Person Re-identification vision Transformer

来源：评论

学校读者我要写书评

暂无评论

VisTa: Visual-contextual and Text-augmented Zero-shot Object-level OOD Detection

VisTa: Visual-contextual and Text-augmented Zero-shot Object...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Zhang, Bin Qu, Xiaoyang Li, Guokuan Wan, Jiguang Wang, Jianzong Wuhan National Laboratory for Optoelectronics Huazhong University of Science and Technology Wuhan China Co. Ltd. Shenzhen China

ISBN: (纸本)9798350368741

As object detectors are increasingly deployed as black-box cloud services or pre-trained models with restricted access to the original training data, the challenge of zero-shot object-level out-of-distribution (OOD) detection arises. This task becomes crucial in ensuring the reliability of detectors in open-world settings. While existing methods have demonstrated success in image-level OOD detection using pre-trained vision-language models like CLIP, directly applying such models to object-level OOD detection presents challenges due to the loss of contextual information and reliance on image-level alignment. To tackle these challenges, we introduce a new method that leverages visual prompts and text-augmented in-distribution (ID) space construction to adapt CLIP for zero-shot object-level OOD detection. Our method preserves critical contextual information and improves the ability to differentiate between ID and OOD objects, achieving competitive performance across different benchmarks. © 2025 IEEE.

关键词： vision-language representations visual prompt Zero-shot object-level OOD detection

来源：评论

学校读者我要写书评

暂无评论

Text-Guided Few-Shot Semantic Segmentation with Training-Free Multimodal Feature Matching

Text-Guided Few-Shot Semantic Segmentation with Training-Fre...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Buthmann, Guillaume Sakai, Tomoya Qiu, Haoxiang Katsuki, Takayuki Kimura, Daiki IBM Research Tokyo Japan Mines Paris - PSL University Paris France

ISBN: (纸本)9798350368741

This paper addresses few-shot semantic segmentation (FSS) guided by text, where we classify unseen novel classes using image and text references as in-context examples, without the need for training. We enhance the quality and stability of the segmentation masks generated by FSS by combining the capability of open-vocabulary zero-shot semantic segmentation (ZSS) based on foundation models for image and text. We propose a training-free approach using multimodal feature matching that performs segmentation by identifying regions in a target image that match the features from both the image and text references. Experimental results demonstrate that the proposed method outperforms state-of-the-art FSS and ZSS methods. © 2025 IEEE.

关键词： few-shot segmentation semantic segmentation vision-language model zero-shot segmentation

来源：评论

学校读者我要写书评

暂无评论

Efficient Localized Perception for Resource-Constrained vision Systems

Efficient Localized Perception for Resource-Constrained Visi...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Subramanyam, A.V. Singal, Niyati Verma, Vinay K. Department of ECE IIIT Delhi India Department of CSE IIIT Delhi India

ISBN: (纸本)9798350368741

Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from autonomous vehicle navigation to medical imaging analyses. Our study introduces a framework aimed at mitigating these challenges by leveraging memory efficient patch based processing for high resolution images. It incorporates a global context representation alongside local patch information, enabling a comprehensive understanding of the image content. In contrast to traditional training methods which are limited by memory constraints, our method enables training of ultra high resolution images. We demonstrate the effectiveness of our method through superior performance on 4 different benchmarks across classification, object detection, and segmentation. Notably, the proposed method achieves strong performance even on resource-constrained devices like Jetson Nano. Our code is available at https://***/ikVib. © 2025 IEEE.

关键词： object recognition patch based training understanding high resolution images

来源：评论

学校读者我要写书评

暂无评论

Dynamic SpikFormer: Low-Latency & Energy-Efficient Spiking Neural Networks with Dynamic Time Steps for vision Transformers

Dynamic SpikFormer: Low-Latency & Energy-Efficient Spiking N...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Datta, Gourav Liu, Zeyu Li, Anni Beerel, Peter A. Dept. of Electrical Computer & Systems Engineering Case Western Reserve University Cleveland United States Ming Hsieh Dept. of Electrical and Computer Engineering University of Southern California Los Angeles United States

ISBN: (纸本)9798350368741

Spiking Neural Networks (SNNs) have emerged as a popular spatio-temporal computing paradigm for complex vision tasks. Recently proposed SNN training algorithms have significantly reduced the number of time steps (down to 1) for improved latency and energy efficiency, however, they target only convolutional neural networks (CNN). These algorithms, when applied to the recently spotlighted vision transformers (ViT), either require a large number of time steps or fail to converge. Based on the analysis of the histograms of the ANN and SNN activation maps, we hypothesize that each ViT block has a different sensitivity to the number of time steps. We propose a novel training framework that dynamically allocates the number of time steps to each ViT module depending on a trainable score assigned to each timestep. In particular, we generate a scalar binary time step mask that filters spikes emitted by each neuron in a leaky-integrate-and-fire (LIF) layer. The resulting SNNs have high activation sparsity and require only accumulate operations (AC), except for the input embedding layer, in contrast to expensive multiply-and-accumulates (MAC) needed in traditional ViTs. This yields significant improvements in energy efficiency. We evaluate our training framework and resulting SNNs on image recognition tasks including CIFAR10, CIFAR100, and ImageNet with different ViT architectures. We obtain a test accuracy of 95.97% with 4.97 time steps with direct encoding on CIFAR10. © 2025 IEEE.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Sample Efficient Reinforcement Learning via Large vision Language Model Distillation

Sample Efficient Reinforcement Learning via Large Vision Lan...

引用

2025 IEEE international conference on Acoustics, Speech, and signal processing, ICASSP 2025

作者： Lee, Donghoon Luu, Tung M. Lee, Younghwan Yoo, Chang D. Robotics Program KAIST Daejeon Korea Republic of Electrical Engineering KAIST Daejeon Korea Republic of

ISBN: (纸本)9798350368741

Recent research highlights the potential of multimodal foundation models in tackling complex decision-making challenges. However, their large parameters make real-world deployment resource-intensive and often impractical for constrained systems. Reinforcement learning (RL) shows promise for task-specific agents but suffers from high sample complexity, limiting practical applications. To address these challenges, we introduce LVLM to Policy (LVLM2P), a novel framework that distills knowledge from large vision-language models (LVLM) into more efficient RL agents. Our approach leverages the LVLM as a teacher, providing instructional actions based on trajectories collected by the RL agent, which helps reduce less meaningful exploration in the early stages of learning, thereby significantly accelerating the agent's learning progress. Additionally, by leveraging the LVLM to suggest actions directly from visual observations, we eliminate the need for manual textual descriptors of the environment, enhancing applicability across diverse tasks. Experiments show that LVLM2P significantly enhances the sample efficiency of baseline RL algorithms. The code is available at https://***/i22024/LVLM2P. © 2025 IEEE.

关键词： Knowledge Distillation Reinforcement Learning vision Language Model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 2 3 4 5 6 7 8 9 10 11 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：