检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

25,252 篇 会议
277 篇 期刊文献
21 册 图书
3 篇 学位论文

馆藏范围

25,553 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

15,800 篇 工学
- 9,866 篇 计算机科学与技术...
- 6,079 篇 电气工程
- 5,771 篇 信息与通信工程
- 5,615 篇 软件工程
- 2,016 篇 光学工程
- 1,453 篇 控制科学与工程
- 1,280 篇 机械工程
- 1,155 篇 电子科学与技术（可...
- 873 篇 生物医学工程（可授...
- 833 篇 生物工程
- 793 篇 仪器科学与技术
- 265 篇 网络空间安全
- 253 篇 化学工程与技术
- 245 篇 安全科学与工程
- 239 篇 交通运输工程
- 183 篇 材料科学与工程（可...
- 162 篇 土木工程
- 159 篇 建筑学
5,716 篇 理学
- 3,480 篇 物理学
- 2,207 篇 数学
- 886 篇 生物学
- 564 篇 统计学（可授理学、...
- 420 篇 系统科学
- 310 篇 化学
3,023 篇 医学
- 2,897 篇 临床医学
- 312 篇 基础医学(可授医学...
- 229 篇 药学(可授医学、理...
1,390 篇 管理学
- 850 篇 管理科学与工程(可...
- 612 篇 图书情报与档案管...
- 169 篇 工商管理
181 篇 法学
133 篇 农学
55 篇 教育学
52 篇 文学
51 篇 经济学
51 篇 军事学
22 篇 艺术学

主题

3,122 篇 image processing
2,084 篇 image coding
2,020 篇 visualization
1,752 篇 image segmentati...
1,486 篇 feature extracti...
1,081 篇 image reconstruc...
907 篇 cameras
885 篇 signal processin...
833 篇 image color anal...
756 篇 humans
712 篇 image edge detec...
688 篇 image enhancemen...
667 篇 computer vision
649 篇 training
582 篇 image analysis
567 篇 deep learning
536 篇 image quality
481 篇 conferences
472 篇 object detection
472 篇 robustness

机构

51 篇 school of electr...
50 篇 shanghai jiao to...
39 篇 ieee
38 篇 university of sc...
36 篇 shanghai jiao to...
36 篇 school of comput...
34 篇 shanghai jiao to...
33 篇 university of ch...
32 篇 microsoft resear...
26 篇 national institu...
25 篇 department of el...
24 篇 hendisli&#x011f
23 篇 institute for in...
23 篇 institute of ima...
23 篇 istanbul teknik ...
23 篇 institute of dig...
22 篇 peking univ inst...
21 篇 institute of inf...
21 篇 univ chinese aca...
21 篇 univ sci & techn...

作者

62 篇 guangtao zhai
46 篇 song li
45 篇 zhai guangtao
32 篇 jie yang
27 篇 li li
25 篇 m. vetterli
25 篇 bovik alan c.
25 篇 li sumei
25 篇 li song
25 篇 sarp ertürk
24 篇 jing zhang
24 篇 b. macq
23 篇 zhang lei
23 篇 li zhuo
23 篇 d.r. bull
22 篇 jürgen seiler
21 篇 shi guangming
20 篇 liu yang
20 篇 zhang wenjun
18 篇 mohamed-chaker l...

语言

24,740 篇 英文
489 篇 土耳其文
209 篇 其他
132 篇 中文
2 篇 西班牙文
2 篇 葡萄牙文

检索条件"任意字段=IEEE Visual Communications and Image Processing Conference"

共 25553 条记录，以下是221-230 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Non-Autoregressive Multimodal Machine Translation

Non-Autoregressive Multimodal Machine Translation

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Guojing Liu Xiangqian Ding Huili Gong Xiangyu Qu Zhenyu Yang Kai Yan Faculty of Information Science and Engineering Ocean University of China Qingdao China School of Cyber Science and Technology Shandong University Qingdao China Shandong Academy of Sciences Qilu University of Technology Jinan China Laiwu Vocational and Technical College Jinan China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Performing better text translation by integrating auxiliary inputs from visual information has gained widespread attention in recent years. While existing methods outperform the text-only translation models, the step-by-step generative style reduces the inference speed, which limits their applicability in real-world scenarios. In this paper, we propose the non-autoregressive language model (NA-LM) for multimodal machine translation. With NA-LM, we develop a Non-Autoregressive Multimodal Transformer (NA-MMT), which accelerates the generative translation via a parallel multimodal decoder. To retain the translation performance, we improve the NA-MMT in twofold: 1) We preprocess the image into a refined sequence of visual entities with length encoding to reduce irrelevant information; 2) We design cross fertility and cross-modal gate attention for multimodal decoder to enhance the generative quality. Experiments on Multi30k datasets show the NA-MMT can generate high-quality translation with over 11× speedup than the baselines, which is strongly competitive.

关键词： visualization Translation image coding Logic gates Signal processing Transformers Encoding Decoding Machine translation Speech processing

来源：评论

学校读者我要写书评

暂无评论

Explore the Hallucination on Low-level Perception for MLLMs

Explore the Hallucination on Low-level Perception for MLLMs

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Yinan Sun Zicheng Zhang Haoning Wu Xiaohong Liu Weisi Lin Guangtao Zhai Xiongkuo Min Shanghai Jiao Tong University S-Lab Nanyang Technological University Nanyang Technological University Shanghai Jiao Tong University Shanghai China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

The rapid development of Multi-modality Large Language Models (MLLMs) has significantly influenced various aspects of industry and daily life, showcasing impressive capabilities in visual perception and understanding. However, these models also exhibit hallucinations, which limit their reliability as AI systems, especially in tasks involving low-level visual perception and understanding. We believe that hallucinations stem from a lack of explicit self-awareness in these models, which directly impacts their overall performance. In this paper, we aim to define and evaluate the self-awareness of MLLMs in low-level visual perception and understanding tasks. To this end, we present QL-Bench, a benchmark settings to simulate human responses to low-level vision, investigating self-awareness in low-level visual perception through visual question answering related to low-level attributes such as clarity and lighting. Specifically, we construct the LLSAVisionQA dataset, comprising 2,990 single images and 1,999 image pairs, each accompanied by an open-ended question about its low-level features. Through the evaluation of 15 MLLMs, we demonstrate that while some models exhibit robust low-level visual capabilities, their self-awareness remains relatively underdeveloped. Notably, for the same model, simpler questions are often answered more accurately than complex ones. However, self-awareness appears to improve when addressing more challenging questions. We hope that our benchmark will motivate further research, particularly focused on enhancing the self-awareness of MLLMs in tasks involving low-level visual perception and understanding.

关键词： Training visualization Large language models Lighting Benchmark testing Signal processing Question answering (information retrieval) Reliability Speech processing visual perception

来源：评论

学校读者我要写书评

暂无评论

SentiFormer: Metadata Enhanced Transformer for image Sentiment Analysis

SentiFormer: Metadata Enhanced Transformer for Image Sentime...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Bin Feng Shulan Ruan Mingzheng Yang Dongxuan Han Huijie Liu Kai Zhang Qi Liu University of Science and Technology of China State Key Laboratory of Cognitive Intelligence Shenzhen International Graduate School Tsinghua University

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

As more and more internet users post images online to express their daily emotions, image sentiment analysis has attracted increasing attention. Recently, researchers generally tend to design different neural networks to extract visual features from images for sentiment analysis. Despite the significant progress, metadata, the data (e.g., text descriptions and keyword tags) for describing the image, has not been sufficiently explored in this task. In this paper, we propose a novel Metadata Enhanced Transformer for sentiment analysis (SentiFormer) to fuse multiple metadata and the corresponding image into a unified framework. Specifically, we first obtain multiple metadata of the image and unify the representations of diverse data. To adaptively learn the appropriate weights for each metadata, we then design an adaptive relevance learning module to highlight more effective information while suppressing weaker ones. Moreover, we further develop a cross-modal fusion module to fuse the adaptively learned representations and make the final prediction. Extensive experiments on three publicly available datasets demonstrate the superiority and rationality of our proposed method.

关键词： Sentiment analysis visualization Analytical models Fuses Neural networks Metadata Signal processing Transformers Feature extraction Speech processing

来源：评论

学校读者我要写书评

暂无评论

Structural-Aware Disentangled Learning with CLIP for Hyperbolic Zero-Shot Sketch-Based image Retrieval*

Structural-Aware Disentangled Learning with CLIP for Hyperbo...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Qing Zhang Jing Zhang Feilong Bao Xiangdong Su Guanglai Gao School of Computer Science Inner Mongolia University Hohhot China Inner Mongolia Key Laboratory of Multilingual Artificial Intelligence Technology Hohhot China National and Local Joint Engineering Research Center of Mongolian Information Processing Technology Hohhot China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

The zero-shot sketch-based image retrieval task faces two key challenges: domain gap and knowledge transfer. Our innovation is recognizing that directly aligning cross-domain features weakens the discriminative ability of the model, as it overlooks the asymmetry between sketches and images. Additionally, Euclidean space is inadequate for capturing the hierarchical structure, which limits the performance of the model on complex data. To address these issues, we propose a Structural-Aware Disentangled Learning network (termed SADLnet) that incorporates CLIP and hyperbolic geometry. Specifically, we use CLIP to extract visual features from each domain to enhance the domain generalization of the model. Furthermore, we design a structure-guided disentanglement strategy to decompose image representations into sketch-related and sketch-unrelated features, addressing the domain gap. Moreover, we project the retrieval features into hyperbolic space to capture hierarchical information, improving feature discrimination in retrieval tasks. Extensive experiments demonstrate that SADLnet establishes new state-of-the-art performance on three datasets.

关键词： Geometry Adaptation models visualization Technological innovation Disentangled representation learning image retrieval Signal processing Feature extraction Data models Speech processing

来源：评论

学校读者我要写书评

暂无评论

OSLO-IC: On-the-Sphere Learned Omnidirectional image Compression with Attention Modules and Spatial Context

OSLO-IC: On-the-Sphere Learned Omnidirectional Image Compres...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Paul Wawerek-López Navid Mahmoudian Bidgoli Pascal Frossard André Kaup Thomas Maugey Multimedia Communications and Signal Processing Friedrich-Alexander-Universität Erlangen-Nürnberg Germany Trimble Inc. École Polytechnique Fédérale de Lausanne (EPFL) Lausanne Switzerland Institut National de Recherche en Informatique et en Automatique (INRIA) Rennes France

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Developing effective 360-degree (spherical) image compression techniques is crucial for technologies like virtual reality and automated driving. This paper advances the state-of-the-art in on-the-sphere learning (OSLO) for omnidirectional image compression framework by proposing spherical attention modules, residual blocks, and a spatial autoregressive context model. These improvements achieve a 23.1% bit rate reduction in terms of WS-PSNR BD rate. Additionally, we introduce a spherical transposed convolution operator for upsampling, which reduces trainable parameters by a factor of four compared to the pixel shuffling used in the OSLO framework, while maintaining similar compression performance. Therefore, in total, our proposed method offers significant rate savings with a smaller architecture and can be applied to any spherical convolutional application.

关键词： Solid modeling image coding Convolution Computational modeling Bit rate Computer architecture Virtual reality Transformers Speech processing Context modeling

来源：评论

学校读者我要写书评

暂无评论

U-SAM: Upgrade Segment Anything Model With Semantic-Aware and Memory-Efficient

U-SAM: Upgrade Segment Anything Model With Semantic-Aware an...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Xiaofeng Jin Jie Hu Jianghang Lin Shengchuan Zhang Liujuan Cao Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University P.R. China Learning and Vision Lab National University of Singapore Singapore

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Segment Anything Model (SAM) has achieved remarkable success in the field of class-agnostic image segmentation by utilizing points or boxes as prompts. However, we identify two significant limitations when compared to traditional image segmentation models: (1) Trained in a category-agnostic interactive segmentation manner, SAM lacks the ability to discern object granularity and semantics, rendering it ineffective for traditional instance, semantic, and panoptic segmentation tasks. (2) SAM’s inefficient use of instance-independent visual features and tokens necessitates maintaining unique features and tokens for each instance, leading to excessive GPU memory consumption and diminished segmentation efficiency. To address these issues, we propose the Universal Segment Anything Model (U-SAM), a semantic-aware and memory-efficient segmentation model designed to perform both promptable and traditional segmentation tasks within a compact and unified framework. Specifically, U-SAM enhances SAM by integrating the Multi-Scale Semantic-Aware image Encoder (S2IE), thus providing multi-scale semantic features for achieving traditional image segmentation tasks. Additionally, U-SAM is equipped with a Twin Token Mask Decoder (T2MD) which reduces GPU memory overhead by substituting replicated visual features with replicated tokens. Extensive experiments across interactive, instance, semantic, and panoptic segmentation demonstrate U-SAM’s promising results. Notably, U-SAM is 9× smaller and 10× faster than SAM, showing strong performance in zero-shot segmentation. Moreover, U-SAM surpasses the SOTA object-prompter-based model, RSPrompter, by achieving a 6.2% increase in PQ, operating 14× faster, and cutting training memory usage by 61%.

关键词： Training image segmentation visualization Semantics Graphics processing units Signal processing Rendering (computer graphics) Decoding Object recognition Speech processing

来源：评论

学校读者我要写书评

暂无评论

PointActionCLIP: Preventing Transfer Degradation in Point Cloud Action Recognition with a Triple-Path CLIP

PointActionCLIP: Preventing Transfer Degradation in Point Cl...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Wei Tao Shenglin He Xiaoyang Qu Jiguang Wan Jianzong Wang Huazhong University of Science and Technology Wuhan China Ping An Technology (Shenzhen) Co. Ltd Shenzhen China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Directly applying CLIP to point cloud action recognition can cause severe accuracy collapse. In this paper, we propose PointActionCLIP, which successfully prevents this transfer degradation with a triplepath CLIP, including the image path, the sequence path, and the label path. Specifically, the image path projects the 3D point cloud sequence onto a 2D image sequence and uses a visual encoder to extract its feature. It also captures the temporal feature of the image sequence with a temporal encoding transformer. The sequence path adopts a pretrained sequence encoder to encode the original point cloud sequence to obtain its spatiotemporal feature. The label path encodes the candidate labels with a text encoder. Finally, we fuse the output of the three paths to obtain the predicted action label. Extensive experiments validate that PointActionCLIP outperforms state-of-the-art (SOTA) methods.

关键词： Point cloud compression Degradation visualization Accuracy Three-dimensional displays Fuses Speech recognition Transformers Feature extraction image sequences

来源：评论

学校读者我要写书评

暂无评论

A Computer Vision and Vibrohaptic Glove-Based Piano Learning System for the visually Impaired

A Computer Vision and Vibrohaptic Glove-Based Piano Learning...

引用

International conference on Advanced Communication Technology (ICACT)

作者： Ian Juha Cho Jin Park Hosung Bae Hankuk Academy of Foreign Studies Yongin South Korea

ISBN: (数字)9791188428137

ISBN: (纸本)9798331507602

The visually impaired are unable to enjoy leisure activities as much as ordinary people due to various limitations. To expand the scope of leisure activities for the visually impaired, we have developed a vibration glove-based system that helps with piano learning. Previous research used 88 infrared light-emitting diodes and gloves with infrared receivers to provide feedback to the user, but this method had many limitations. In particular, the inconvenient user experience and low accuracy were the biggest problems. Our method solves both problems using a camera and an image processing algorithm. As a result of testing the model on 20 piano images, it was shown that all keys were perfectly recognized in 75% of cases, and the gloves could be comfortably used in practice without any difficulty. Thus, our method presents a simpler user experience for the visually impaired, without requiring any special modifications to the piano.

关键词： Vibrations Learning systems Computer vision image recognition visual impairment Receivers Light emitting diodes User experience communications technology Testing

来源：评论

学校读者我要写书评

暂无评论

Enhancing Vision: Harmonizing Frequency for Imaging Quality and Perception Accuracy

Enhancing Vision: Harmonizing Frequency for Imaging Quality ...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Hongyang Chen Kaisheng Ma Xi’an Jiaotong University Tsinghua University

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

In low-level vision tasks, achieving harmony between visual quality and recognition accuracy is often challenging, as the two do not always align. Many existing approaches focus on optimizing downstream tasks by linking image quality to machine perception, typically incurring additional burdens such as extensive annotations and joint training. In this work, we demonstrate that independent low-level reconstruction algorithms can simultaneously enhance imaging quality and downstream perception accuracy. By conducting a comprehensive frequency-domain analysis, we identify high-frequency components as critical for both visual and perceptual tasks. To counteract the frequency information loss typically seen in ISP pipelines, we propose a Multi-Frequency Fusion Block (MFFB) for on-the-fly upsampling, alongside a Frequency-Aware Supervision (FAS) mechanism guided by discrete wavelet transform. Our method achieves a notable +0.32 dB improvement in smart ISP performance on the Zurich dataset. Moreover, without relying on assistance from downstream tasks, our approach demonstrates significant improvements in object detection and instance segmentation.

关键词： Instance segmentation Training visualization Accuracy Frequency-domain analysis Signal processing algorithms Imaging Object detection Transforms Speech processing

来源：评论

学校读者我要写书评

暂无评论

From Pixels to Voice: A Simple and Efficient End-to-End Spoken image Description Approach via Vision Codec Language Models

From Pixels to Voice: A Simple and Efficient End-to-End Spok...

引用

International conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Chung Tran Sakriani Sakti Graduate School of Science and Technology Nara Institute of Science and Technology Ikoma Japan

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Neural audio codecs provide a powerful tool for compressing audio signals into discrete codec representations. This compact discrete representation has made it possible to successfully apply a natural language processing (NLP) model to various audio and speech processing tasks, including text-to-speech (e.g., VALL-E, VALL-E X) and multimodal audio-text generation (e.g., LauraGPT, VioLA). While these models excel at handling sequential data like text and speech, their potential for processing non-sequential data, such as images, remains unexplored. In this paper, we introduce PixVoxLM, a simple and efficient end-to-end framework that combines vision-language models with neural audio codecs to tackle the image-to-Speech (I2S) problem. Experiments on the Flickr8k dataset demonstrate that PixVoxLM delivers promising results compared to existing I2S methods. Furthermore, this research is the first to explore a new capability: visual-guided speech completion in I2S model, paving the way for new practical applications in everyday communication, such as speech prompt-based instruction.

关键词： Codecs Speech coding Signal processing Natural language processing Data models Acoustics Text to speech Speech processing Synthetic aperture sonar

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 19 20 21 22 23 24 25 26 27 28 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：