检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

25,252 篇 会议
277 篇 期刊文献
21 册 图书
3 篇 学位论文

馆藏范围

25,553 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

15,800 篇 工学
- 9,866 篇 计算机科学与技术...
- 6,079 篇 电气工程
- 5,771 篇 信息与通信工程
- 5,615 篇 软件工程
- 2,016 篇 光学工程
- 1,453 篇 控制科学与工程
- 1,280 篇 机械工程
- 1,155 篇 电子科学与技术（可...
- 873 篇 生物医学工程（可授...
- 833 篇 生物工程
- 793 篇 仪器科学与技术
- 265 篇 网络空间安全
- 253 篇 化学工程与技术
- 245 篇 安全科学与工程
- 239 篇 交通运输工程
- 183 篇 材料科学与工程（可...
- 162 篇 土木工程
- 159 篇 建筑学
5,716 篇 理学
- 3,480 篇 物理学
- 2,207 篇 数学
- 886 篇 生物学
- 564 篇 统计学（可授理学、...
- 420 篇 系统科学
- 310 篇 化学
3,023 篇 医学
- 2,897 篇 临床医学
- 312 篇 基础医学(可授医学...
- 229 篇 药学(可授医学、理...
1,390 篇 管理学
- 850 篇 管理科学与工程(可...
- 612 篇 图书情报与档案管...
- 169 篇 工商管理
181 篇 法学
133 篇 农学
55 篇 教育学
52 篇 文学
51 篇 经济学
51 篇 军事学
22 篇 艺术学

主题

3,122 篇 image processing
2,084 篇 image coding
2,020 篇 visualization
1,752 篇 image segmentati...
1,486 篇 feature extracti...
1,081 篇 image reconstruc...
907 篇 cameras
885 篇 signal processin...
833 篇 image color anal...
756 篇 humans
712 篇 image edge detec...
688 篇 image enhancemen...
667 篇 computer vision
649 篇 training
582 篇 image analysis
567 篇 deep learning
536 篇 image quality
481 篇 conferences
472 篇 object detection
472 篇 robustness

机构

51 篇 school of electr...
50 篇 shanghai jiao to...
39 篇 ieee
38 篇 university of sc...
36 篇 shanghai jiao to...
36 篇 school of comput...
34 篇 shanghai jiao to...
33 篇 university of ch...
32 篇 microsoft resear...
26 篇 national institu...
25 篇 department of el...
24 篇 hendisli&#x011f
23 篇 institute for in...
23 篇 institute of ima...
23 篇 istanbul teknik ...
23 篇 institute of dig...
22 篇 peking univ inst...
21 篇 institute of inf...
21 篇 univ chinese aca...
21 篇 univ sci & techn...

作者

62 篇 guangtao zhai
46 篇 song li
45 篇 zhai guangtao
32 篇 jie yang
27 篇 li li
25 篇 m. vetterli
25 篇 bovik alan c.
25 篇 li sumei
25 篇 li song
25 篇 sarp ertürk
24 篇 jing zhang
24 篇 b. macq
23 篇 zhang lei
23 篇 li zhuo
23 篇 d.r. bull
22 篇 jürgen seiler
21 篇 shi guangming
20 篇 liu yang
20 篇 zhang wenjun
18 篇 mohamed-chaker l...

语言

24,740 篇 英文
489 篇 土耳其文
209 篇 其他
132 篇 中文
2 篇 西班牙文
2 篇 葡萄牙文

检索条件"任意字段=IEEE Visual Communications and Image Processing Conference"

共 25553 条记录，以下是171-180 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Object-Centric Discriminative Learning for Text-Based Person Retrieval

Object-Centric Discriminative Learning for Text-Based Person...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Haiwen Li Delong Liu Fei Su Zhicheng Zhao Beijing University of Posts and Telecommunications Beijing Key Laboratory of Network System and Network Culture China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Text-based person retrieval (TBPR) is a vision-language task that aims to find specific pedestrians in a large image gallery using the textual description. However, due to the heterogeneity between modalities and the redundancy in visual representations, it remains a challenging task. Existing methods do not explicitly reduce the influence of the background regions in images, inevitably decreasing representation ability and reducing the image-text matching performance. In this paper, we propose a novel framework for text-based person retrieval, termed Object-Centric Discriminative Learning (OCDL), which incorporates person masks to indicate attentive regions, thereby enhancing the model’s focus on the pedestrians in images while suppressing the background noise. Additionally, a novel crossmodal matching loss, namely Soft Angular Distribution Matching (SADM), is introduced to learn discriminative visual and textual representations. Extensive experiments on three widely-used TBPR datasets demonstrate the effectiveness of our approach. The code is available at https://***/JThuge/OCDL.

关键词： visualization Pedestrians Codes Redundancy Signal processing Performance gain Benchmark testing Feature extraction Speech processing Cross modal retrieval

来源：评论

学校读者我要写书评

暂无评论

VisTa: visual-contextual and Text-augmented Zero-shot Object-level OOD Detection

VisTa: Visual-contextual and Text-augmented Zero-shot Object...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Bin Zhang Xiaoyang Qu Guokuan Li Jiguang Wan Jianzong Wang Wuhan National Laboratory for Optoelectronics Huazhong University of Science and Technology Wuhan China Ping An Technology (Shenzhen) Co. Ltd Shenzhen China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

As object detectors are increasingly deployed as black-box cloud services or pre-trained models with restricted access to the original training data, the challenge of zero-shot object-level out-of-distribution (OOD) detection arises. This task becomes crucial in ensuring the reliability of detectors in open-world settings. While existing methods have demonstrated success in image-level OOD detection using pre-trained vision-language models like CLIP, directly applying such models to object-level OOD detection presents challenges due to the loss of contextual information and reliance on image-level alignment. To tackle these challenges, we introduce a new method that leverages visual prompts and text-augmented in-distribution (ID) space construction to adapt CLIP for zero-shot object-level OOD detection. Our method preserves critical contextual information and improves the ability to differentiate between ID and OOD objects, achieving competitive performance across different benchmarks.

关键词： Training visualization Adaptation models Training data Detectors Benchmark testing Signal processing Reliability Speech processing Context modeling

来源：评论

学校读者我要写书评

暂无评论

LV-ReID: Large Language-Vision Alignment Model for Text-based Person Re-identification

LV-ReID: Large Language-Vision Alignment Model for Text-base...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Yinghui Xia Chao Wang Jinsong Yang HKUST(GZ) Wuhan University AutoAgents.ai

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Person Re-Identification (ReID) is a critical task in computer vision that involves identifying individuals across different cameras or video frames. It’s challenging due to variations in appearance, lighting, viewpoints, clothing, and occlusions. Text-based ReID adds complexity by requiring image retrieval or individual identification from a dataset based on text queries. The BLIP-2 model addresses these challenges by combining multi-modal alignment and matching into a single framework, using a pre-trained vision-language model with a Q-Former component to bridge the visual and textual modalities. This approach significantly boosts performance in multi-modal tasks and information retrieval, especially with large datasets. The LV-ReID framework, which incorporates BLIP-2, enhances text-based ReID by integrating retrieval and generation tasks. The experimental results show BLIP-2’s effectiveness in aligning and matching pedestrian images for information retrieval tasks. It demonstrates proficiency in multi-modal tasks and offers an efficient solution for text-based ReID by fusing visual and textual data, improving pedestrian identification accuracy in complex environments.

关键词： visualization Pedestrians image retrieval Streaming media Signal processing Information retrieval Security Speech processing Standards Identification of persons

来源：评论

学校读者我要写书评

暂无评论

Adapting Without Seeing: Text-Aided Domain Adaptation for Adapting CLIP-like Models to Novel Domains

Adapting Without Seeing: Text-Aided Domain Adaptation for Ad...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Louis Hémadou Héléna Vorobieva Ewa Kijak Frédéric Jurie Digital Sciences & Technologies Department Safran Tech Université de Rennes IRISA INRIA CNRS Université de Caen Normandie ENSICAEN CNRS

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

This paper addresses the challenge of adapting large vision models, such as CLIP, to domain shifts in image classification tasks. While these models, pre-trained on vast datasets like LAION 2B, offer powerful visual representations, they may struggle when applied to domains significantly different from their training data, such as industrial applications. We introduce TADA, a Text-Aided Domain Adaptation method that adapts the visual representations of these models to new domains without requiring target domain images. TADA leverages verbal descriptions of the domain shift to capture the differences between the pre-training and target domains. Our method integrates seamlessly with fine-tuning strategies, including prompt learning methods. We demonstrate TADA’s effectiveness in improving the performance of large vision models on domain-shifted data, achieving state-of-the-art results on benchmarks like DomainNet.

关键词： Learning systems Adaptation models visualization Training data Signal processing Benchmark testing Data models Acoustics Speech processing image classification

来源：评论

学校读者我要写书评

暂无评论

Minimizing Disparities between Real and Pseudo Queries for Unsupervised visual Grounding

Minimizing Disparities between Real and Pseudo Queries for U...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Hui Jiang Changkai Ji Jilan Xu Yanhao Zhu Yuejie Zhang Rui Feng Tao Zhang Shang Gao School of Computer Science Shanghai Key Laboratory of Intelligent Information Processing Fudan University Shanghai China School of Information Management and Engineering Shanghai Key Laboratory of Financial Information Technology Shanghai University of Finance and Economics Shanghai China School of Information Technology Deakin University Victoria Australia

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

visual grounding involves the identification and localization of image regions given textual descriptions. To reduce the manual labeling effort on region-text pairs, unsupervised visual grounding aims to generate pseudo bounding box and query pairs for training grounding models. However, there exists significant disparities between real and pseudo queries in terms of object, attribute distributions, and textual formats, limiting the generalization performance of unsupervised grounding methods. To address this challenge, we propose a novel unsupervised visual grounding framework. During training, we prompt Multimodal Large Language Models to generate pseudo queries, in which the entities are beyond the object detector’s pre-defined limited categories, and are associated with richer attributes. We further devise a Modifier Tree structure to bridge the gap of textual format between real and pseudo queries. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art unsupervised approaches on public benchmark datasets, particularly when dealing with complex queries.

关键词： Training Location awareness visualization Limiting Grounding Large language models Manuals Labeling Speech processing image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Bridging the Machine-Human Gap in Blurred-image Classification via Entropy Maximisation

Bridging the Machine-Human Gap in Blurred-Image Classificati...

引用

International image processing, Applications and Systems conference (IPAS)

作者： Emilio Sansano-Sansano Marina Martínez-García Javier Portilla INIT Universitat Jaume I Castellón de la Plana Spain IMAC Universitat Jaume I Castellón de la Plana Spain Instituto de Óptica CSIC Madrid Spain

ISBN: (数字)9798331506520

ISBN: (纸本)9798331506537

Recent studies point to an accuracy gap between humans and Artificial Neural Network (ANN) models when classifying blurred images, with humans outperforming ANNs. To bridge this gap, we introduce a spectral channel-based range-constrained entropy merit function, from which we devise a zero-phase, circular symmetric blind deblurring method. We apply it as a pre-processing step for image classification and test it using pre-trained classification models and images blurred by Gaussian kernels. We compare our method to state-of-the-art restoration methods, showing its superiority, effectively bridging the machine-human gap for most models and blur levels. Our results also rank higher than the competitors in no-reference and full-reference image quality metrics. Notwithstanding the limitation to zero-phase blur, this work shows that, for image pre-processing aimed at visual tasks, it may be advantageous to use merit functions based on vision science and information theory, rather than on the expected error to the latent image.

关键词： Measurement image quality visualization Computer vision Artificial neural networks Entropy Kernel Optimization Information theory image classification

来源：评论

学校读者我要写书评

暂无评论

A Critical Assessment of visual Sound Source Localization Models Including Negative Audio

A Critical Assessment of Visual Sound Source Localization Mo...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Xavier Juanola Gloria Haro Magdalena Fuentes Universitat Pompeu Fabra Barcelona Spain MARL-IDM New York University New York USA

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

The task of visual Sound Source Localization (VSSL) involves identifying the location of sound sources in visual scenes, integrating audio-visual data for enhanced scene understanding. Despite advancements in state-of-the-art (SOTA) models, we observe three critical flaws: i) The evaluation of the models is mainly focused in sounds produced by objects that are visible in the image, ii) The evaluation often assumes a prior knowledge of the size of the sounding object, and iii) No universal threshold for localization in real-world scenarios is established, as previous approaches only consider positive examples without accounting for both positive and negative cases. In this paper, we introduce extended test sets and new metrics designed to complete the current standard evaluation of VSSL models by testing them in scenarios where none of the objects in the image corresponds to the audio input, i.e. a negative audio. We consider three types of negative audio: silence, noise and offscreen. Our analysis reveals that numerous SOTA models fail to appropriately adjust their predictions based on audio input, suggesting that these models may not be leveraging audio information as intended. Additionally, we provide a comprehensive analysis of the range of maximum values in the estimated audio-visual similarity maps, in both positive and negative audio cases, and show that most of the models are not discriminative enough, making them unfit to choose a universal threshold appropriate to perform sound localization without any a priori information of the sounding object, that is, object size and visibility.

关键词： Location awareness Measurement visualization Analytical models Noise Predictive models Speech processing Standards Testing Videos

来源：评论

学校读者我要写书评

暂无评论

CoF: Coarse to Fine-Grained image Understanding for Multi-modal Large Language Models

CoF: Coarse to Fine-Grained Image Understanding for Multi-mo...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Yeyuan Wang Dehong Gao Bin Li Rujiao Long Lei Yi Xiaoyan Cai Libin Yang Jinxia Zhang Shanqing Yu Qi Xuan School of Automation Northwestern Polytechnical University Xi’an China School of Cybersecurity Northwestern Polytechnical University Xi’an China Alibaba Group Hangzhou China The Key Laboratory of Measurement and Control of CSE Ministry of Education School of Automation Southeast University Nanjing China Advanced Ocean Institute of Southeast University Nantong China Zhejiang University of Technology Hangzhou China Binjiang Institute of Artificial Intelligence Hangzhou China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

The impressive performance of Large Language Model (LLM) has prompted researchers to develop Multi-modal LLM (MLLM), which has shown great potential for various multi-modal tasks. However, current MLLM often struggles to effectively address fine-grained multi-modal challenges. We argue that this limitation is closely linked to the models’ visual grounding capabilities. The restricted spatial awareness and perceptual acuity of visual encoders frequently lead to interference from irrelevant background information in images, causing the models to overlook subtle but crucial details. As a result, achieving fine-grained regional visual comprehension becomes difficult. In this paper, we break down multi-modal understanding into two stages, from Coarse to Fine (CoF). In the first stage, we prompt the MLLM to locate the approximate area of the answer. In the second stage, we further enhance the model’s focus on relevant areas within the image through visual prompt engineering, adjusting attention weights of pertinent regions. This, in turn, improves both visual grounding and overall performance in downstream tasks. Our experiments show that this approach significantly boosts the performance of baseline models, demonstrating notable generalization and effectiveness. Our CoF approach is available online at https://***/Gavin001201/CoF.

关键词： Location awareness visualization image recognition Grounding Large language models Interference Signal processing Prompt engineering Speech processing visual perception

来源：评论

学校读者我要写书评

暂无评论

Enhanced Satellite image Fusion Using Deep Learning and Feature Extraction Techniques: A Survey 1st

Enhanced Satellite Image Fusion Using Deep Learning and Feat...

引用

1st International conference on Intelligent Systems in Computing and communications, ISCComm 2023

作者： Nallagachu, Swathi Sandanalakshmi, R. Department of Electronics and Communication Engineering Puducherry Technological University Puducherry India

ISBN: (纸本)9783031756047

This paper presents an overview and analysis of numerous research projects on image fusion methods, with a particular emphasis on deep learning-based methods. The research analyses the inadequacies of current fusion models and suggests novel methods for feature extraction, visual sensor networks, remote sensing applications, medical imaging, and multi-resolution image fusion. The study demonstrates the advantages of deep learning techniques for image fusion tasks, including Convolutional Neural Networks (CNNs), Stacked Auto encoders, and Convolutional Sparse Representation (CSR). These methods provide better fusion quality, fast processing, and better visual perception. Numerous studies provide novel methods, such as the rapid Integer Lifting Wavelet Transform (ILWT), Dual-Tree Complex Contourlet Transform (DT-CCT), and hybrid fusion methods that combine Discrete Cosine Transform (DCT) and Integer Lifting Wavelet Transform (ILWT) techniques. The review literature survey shows how these techniques can improve fusion outcomes while introducing little information distortion. In general, this paper brings together improvements in feature extraction and image fusion techniques, demonstrating the strength and potential of deep learning-based approaches. It is a useful tool for academics and professionals working in various fields who want to comprehend and advance these fields of study. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： image fusion

来源：评论

学校读者我要写书评

暂无评论

VitaCap: A Vision Transformer-Based Framework for image Captioning 29

VitaCap: A Vision Transformer-Based Framework for Image Capt...

引用

29th International Computer conference, Computer Society of Iran, CSICC 2025

作者： Nia, Amirhossein Hossein Feizi, Fatemehzahra Ahmadi, Ali School of Computer Engineering K. N. Toosi University of Technology Tehran Iran School of Computer Engineering Iran University of Science and Technology Tehran Iran

ISBN: (纸本)9798331523114

Automatic image captioning, which involves generating textual descriptions from visual content, is a challenging and multidisciplinary task combining computer vision and natural language processing. This paper introduces VitaCap (Vision Transformer for Captioning), a transformer-based encoder-decoder architecture designed for effective image caption generation. The model integrates multiple feature extraction technique U-Net [1] for pixel features, Graph Convolutional Networks (GCNs) [2] for grid features, and Faster R-CNN [3] for region-based features providing a rich and comprehensive visual representation. These features are processed by the transformer encoder to capture complex dependencies, which are then utilized by the decoder to generate meaningful and contextually relevant captions. Experimental results demonstrate that VitaCap achieves promising performance across various image datasets, highlighting its potential as a robust solution for image captioning tasks. © 2025 ieee.

关键词： Computer vision image captioning Multi features fusion Transformers

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 14 15 16 17 18 19 20 21 22 23 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：