检索结果-内蒙古大学图书馆

AMTrans:Auto-Correlation Multi-Head Attention Transformer for Infrared Spectral Deconvolution

清华大学学报自然科学版（英文版） 2025年第3期30卷 1329-1341页

作者： Lei Gao Liyuan Cui Shuwen Chen Lizhen Deng Xiaokang Wang Xiaohong Yan Hu Zhu College of Electronic and Optical Engineering & College of Flexible Electronics(Future Technology) Nanjing University of Posts and TelecommunicationsNanjing 210003China Jiangsu Province Key Lab on Image Processing and Image Communication Nanjing University of Posts and TelecommunicationsNanjing 210003China National Engineering Research Center of Communication and Network Technology Nanjing University of Posts and TelecommunicationsNanjing 210003China School of Computer Science and Technology Hainan UniversityHaikou 570228China

Infrared spectroscopy analysis has found widespread applications in various fields due to advancements in technology and industry *** improve the quality and reliability of infrared spectroscopy signals,deconvolution is a crucial preprocessing *** by the transformer model,we propose an Auto-correlation Multi-head attention Transformer(AMTrans)for infrared spectrum sequence *** auto-correlation attention model improves the scaled dot-product attention in the *** utilizes attention mechanism for feature extraction and implements attention computation using the auto-correlation *** auto-correlation attention model is used to exploit the inherent sequence nature of spectral data and to effectively recovery spectra by capturing auto-correlation patterns in the *** proposed model is trained using supervised learning and demonstrates promising results in infrared spectroscopic *** comparing the experiments with other deconvolution techniques,the experimental results show that the method has excellent deconvolution performance and can effectively recover the texture details of the infrared spectrum.

关键词： spectroscopy spectral deconvolution transformer auto-correlation mechanism

来源：评论

学校读者我要写书评

暂无评论

Cascaded Sliding-Window-Based Relativistic GAN Fusion for Perceptual and Consistent Video Super-Resolution 6th

Cascaded Sliding-Window-Based Relativistic GAN Fusion for P...

引用

6th IFIP TC 12 International Conference on Intelligence science, ICIS 2024

作者： Li, Dingyi PCA Lab Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education and Jiangsu Key Lab of Image and Video Understanding for Social Security School of Computer Science and Engineering Nanjing University of Science and Technology Nanjing210094 China

ISBN: (纸本)9783031712524

Perceptual video super-resolution aims at converting low-resolution videos to visually appealing high-resolution ones. It may lead to temporal inconsistency due to the drastically changing outputs. In this paper, we propose cascaded sliding-window-based relativistic GAN (Generative Adversarial Network) fusion for perceptual and consistent video super-resolution (PC-VSR). Firstly, cascaded sliding-window-based relativistic GAN is designed to extract more useful information. It enlarges the temporal receptive field of sliding-window-based model in each step. It is able to enhance perceptual quality and compensate temporal consistency progressively and sufficiently. The trained separate refinement generator networks are fused into a final refinement generator. The final refinement generator can be calculated recursively at the testing stage. With our generator fusion, the parameter number is reduced and good quality is maintained. Extensive experimental results demonstrate that our approach outperforms state-of-the-art super-resolution methods in terms of perceptual quality. Our method also achieves good temporal consistency and per-pixel accuracy, compared with other perceptual approaches. © IFIP International Federation for Information Processing 2025.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Learning Inverse Laplacian Pyramid for Progressive Depth Completion

arXiv

引用

arXiv 2025年

作者： Wang, Kun Yan, Zhiqiang Fan, Junkai Li, Jun Yang, Jian PCA Lab Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education Jiangsu Key Lab of Image and Video Understanding for Social Security School of Computer Science and Engineering Nanjing University of Science and Technology Nanjing China

Depth completion endeavors to reconstruct a dense depth map from sparse depth measurements, leveraging the information provided by a corresponding color image. Existing approaches mostly hinge on single-scale propagation strategies that iteratively ameliorate initial coarse depth estimates through pixel-level message passing. Despite their commendable outcomes, these techniques are frequently hampered by computational inefficiencies and a limited grasp of scene context. To circumvent these challenges, we introduce LP-Net, an innovative framework that implements a multi-scale, progressive prediction paradigm based on Laplacian Pyramid decomposition. Diverging from propagation-based approaches, LP-Net initiates with a rudimentary, low-resolution depth prediction to encapsulate the global scene context, subsequently refining this through successive upsampling and the reinstatement of high-frequency details at incremental scales. We have developed two novel modules to bolster this strategy: 1) the Multi-path Feature Pyramid module, which segregates feature maps into discrete pathways, employing multi-scale transformations to amalgamate comprehensive spatial information, and 2) the Selective Depth Filtering module, which dynamically learns to apply both smoothness and sharpness filters to judiciously mitigate noise while accentuating intricate details. By integrating these advancements, LP-Net not only secures state-of-the-art (SOTA) performance across both outdoor and indoor benchmarks such as KITTI, NYUv2, and TOFDC, but also demonstrates superior computational efficiency. At the time of submission, LP-Net ranks 1st among all peer-reviewed methods on the official KITTI leaderboard. The source code will be made publicly accessible upon paper acceptance. © 2025, CC BY-NC-SA.

关键词： Laplace transforms

来源：评论

学校读者我要写书评

暂无评论

SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning

arXiv

引用

arXiv 2025年

作者： Thoker, Fida Mohammad Jiang, Letian Zhao, Chen Bagad, Piyush Doughty, Hazel Ghanem, Bernard Snoek, Cees G.M. CEMSE King Abdullah University of Science and Technology Makkah Saudi Arabia VGG University of Oxford Oxford United Kingdom LIACS Leiden University Leiden Netherlands Video & Image Sense Lab University of Amsterdam Amsterdam Netherlands

Continued advances in self-supervised learning have led to significant progress in video representation learning, offering a scalable alternative to supervised approaches by eliminating the need for manual annotations. Despite strong performance on standard action recognition benchmarks, existing video self-supervised learning methods are predominantly evaluated within narrow protocols—typically pre-training on Kinetics-400 and finetuning on similar datasets—limiting our understanding of their generalization capabilities in real-world settings. In this work, we present a comprehensive evaluation of modern video self-supervised learning models, focusing on generalization across four key downstream factors: domain shift, sample efficiency, action granularity, and task diversity. Building on our prior work analyzing benchmark sensitivity in CNN-based contrastive learning, we extend the study to cover current state-of-the-art transformer-based video-only and video-text representation models. Specifically, we benchmark 12 transformer-based methods (7 video-only, 5 video-text) and compare them against 10 CNN-based methods, resulting in over 1100 experiments across 8 datasets and 7 downstream tasks. Our analysis reveals that, despite architectural advancements, transformer-based models remain sensitive to downstream conditions. No single method generalizes consistently across all factors;for instance, video-only transformers are more robust to domain shift, CNN-based models perform better on tasks requiring fine-grained temporal reasoning, and video-text transformers underperform both in several downstream settings despite large-scale pretraining. We also observe that recent transformer-based approaches do not universally outperform earlier methods. These findings provide a detailed understanding of the capabilities and limitations of current video self-supervised learning approaches and establish an extended benchmark for evaluating generalization in video representation

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label image Classification

引用

IEEE Transactions on Multimedia 2025年

作者： Wang, Lei Zhan, Yibing Ma, Leilei Tao, Dapeng Ding, Liang Gong, Chen Nanjing University of Science and Technology Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education Jiangsu Key Laboratory of Image and Video Understanding for Social Security School of Computer Science and Engineering Jiangsu Nanjing210094 China JD Explore Academy Beijing100000 China Anhui University School of Computer Science and Technology Anhui Heifei230601 China Yunnan University FIST LAB School of Information Science and Engineering Yunnan Kunming650091 China

Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the considerable discrepancies between single- and multi-label images, i.e., a multi-label image involves multiple co-occurred categories and fickle object scales. On the other hand, previous multi-label image classification (MLIC) methods tend to design elaborate models, bringing expensive computation. In this article, we introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix. The "splice"in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias;2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together. Furthermore, such splice in our SpliceMixed mini-batch enables interactions between mixed images and original regular images. We also provide a simple and non-parametric extension based on consistency learning (SpliceMix-CL) to show the potential of extending our SpliceMix. Extensive experiments on various tasks demonstrate that only using SpliceMix with a baseline model (e.g., ResNet) achieves better performance than state-of-the-art methods. Moreover, the generalizability of our SpliceMix is further validated by the improvements in current MLIC methods when married with our SpliceMix. The code is available at https://***/zuiran/SpliceMix. © 1999-2012 IEEE.

关键词： image classification

来源：评论

学校读者我要写书评

暂无评论

VIS4SL: A visual analytic approach for interpreting and diagnosing shortcut learning

引用

Knowledge-Based Systems 2025年 320卷

作者： Meng, Xiyu Tang, Tan Zhou, Yuhua Yan, Zihan Deng, Dazhen Wang, Yongheng Wu, Yuhan Wu, Yingcai College of Computer Science and Technology Zhejiang University No. 38 Zheda Road Hangzhou310027 China The State Key Lab of CAD&CG Zhejiang University Hangzhou310012 China Laboratory of Art and Archaeology Image Zhejiang University Hangzhou310007 China Research Center of Big Data Intelligence Zhejiang Lab Hangzhou311100 China

Shortcut learning, a phenomenon where deep neural networks inadvertently learn irrelevant features, has been extensively discussed due to its impact on model generalization and unexpected failures. Interpreting and diagnosing shortcut learning is challenging due to its diverse manifestations and multiple influencing factors. To assist data scientists in these tasks, we introduce VIS4SL, an interactive visual analytics approach that harnesses both human intelligence and computational power. VIS4SL integrates a perturbation-based method with comprehensive visualizations to facilitate an understandable analysis of learned features. We also present a set of comparative visualizations that allow for the evaluation of model explanations against robust proxies, particularly human explanations, to quantify the degree of shortcut learning and assess model components. Two case studies, involving natural image classification and visualization classification, demonstrate the efficacy of VIS4SL in practical applications. Our findings reveal that the model uses the orientation of bars to differentiate between bar charts and Pareto charts. Furthermore, we explore how interactive visualizations enhance data scientists’ understanding of shortcut learning, enabling the development of more precise deep learning models. © 2025 Elsevier B.V.

关键词： Video analysis

来源：评论

学校读者我要写书评

暂无评论

ScanDTM: A Novel Dual-Temporal Modulation Scanpath Prediction Model for Omnidirectional images

引用

IEEE Transactions on Circuits and Systems for Video technology 2025年

作者： Zhu, Dandan Zhang, Kaiwei Min, Xiongkuo Zhai, Guangtao Yang, Xiaokang East China Normal University School of Computer Science and Technology Shanghai200333 China Shanghai Jiao Tong University Institute of Image Communication and Network Engineering Shanghai200240 China Shanghai Jiao Tong University MoE Key Lab of Artificial Intelligence AI Institute Shanghai200240 China

Scanpath prediction for omnidirectional images aims to effectively simulate the human visual perception mechanism to generate dynamic realistic fixation trajectories. However, the majority of scanpath prediction methods for omnidirectional images are still in their infancy as they fail to accurately capture the time-dependency of viewing behavior and suffer from sub-optimal performance along with limited generalization capability. A desirable solution should achieve a better trade-off between prediction performance and generalization ability. To this end, we propose a novel dual-temporal modulation scanpath prediction (ScanDTM) model for omnidirectional images. Such a model is designed to effectively capture long-range time-dependencies between various fixation regions across both internal and external time dimensions, thereby generating more realistic scanpaths. In particular, we design a Dual Graph Convolutional Network (Dual-GCN) module comprising a semantic-level GCN and an image-level GCN. This module servers as a robust visual encoder that captures spatial relationships among various object regions within an image and fully utilizes similar images as complementary information to capture similarity relations across relevant images. Notably, the proposed Dual-GCN focuses on modeling temporal correlations from both local and global perspectives within the internal time dimension. Furthermore, drawing inspiration from the promising generalization capabilities of diffusion models across various generative tasks, we introduce a novel diffusion-guided saliency module. This module formulates the prediction issue as a conditional generative process for the saliency map, utilizing extracted semantic-level and image-level visual features as conditions. With the well-designed diffusion-guided saliency module, our proposed ScanDTM model acting as an external temporal modulator, we can progressively refine the generated scanpath from the noisy map. We conduct extensive expe

关键词： Prediction models

来源：评论

学校读者我要写书评

暂无评论

Holographic 3D display system using holographic optical element

引用

Proceedings of SPIE - The International Society for Optical Engineering 1999年 3637卷 78-83页

作者： Yamasaki, Koji Okamoto, Masaaki Ando, Takahisa Kitagawa, Tetuya Shimizu, Eiji Lab Image Information Science and Technology Toyonaka Japan

In recent years, the study of 3D-display is rapid development and many researchers propose many methods. Holography is best methods. But, it is difficult that we developed holographic movie in the future tense. At the present time, stereogram method will make practicable in the near future. These methods can easily make animated 3D image. But this method has one problem; this method is conflict between convergence and accommodation. An observer can't watch 3D-display of this method long time. The authors will solve this problem. The authors proposed the 3D-display system that is used holography and stereogram technology. The proposed system has little conflict between convergence and accommodation. The authors developed this 3D-display system. The developed system has four focuses in horizontal direction. The display parts of developed system are LCD display because the developed system can play 3D movie. Of cause, this display doesn't have special glasses. But, color of this display is single color. It is red. The authors will develop full color 3D-display. The picture size of this display is about 6 inch and the form of this display is very large. The author will develop small size system and show large size picture.

关键词： Liquid crystal displays

来源：评论

学校读者我要写书评

暂无评论

Evaluation of HOE for head mounted display

引用

Proceedings of SPIE - The International Society for Optical Engineering 1999年 3637卷 110-118页

作者： Ando, Takahisa Yamasaki, Koji Okamoto, Masaaki Matsumoto, Toshiaki Shimizu, Eiji Lab of Image Information Science and Technology Osaka Japan

We will discuss the characteristics of the Head Mounted Display(HMD) using Holographic Optical Element(HOE) in this paper. We have already proposed that using the HOE we could realize the see-through HMD, that is to say the binocular stereoscopic display. This time we evaluate the influence on the human vision system regarding the optical characteristics of the HOE. The HMD using HOE we proposed so far is the Maxwellian View which is the direct projection on the human retina. When we see something by Maxwellian View, we don't need the focusing of the crystalline lens (ocular accommodation) because the depth field is extremely wide. Therefore our binocular crystalline lens will focus at the vergence point when the Maxwellian View is used on the binocular retina. And we can solve the dissociation of accommodation and convergence which is the basic problem of the conventional HMD. We have made the prototype of HOE which can provide the Maxwellian View on our retina and we have proved that our HOE could separate the binocular images onto left and right eye. In this report, we will introduce that the Maxwellian View will change the ocular accommodation optionally according to the convergence when we see the real objects and the virtual objects at the same time. We proved that the HOE which provided the Maxwellian View could solve the dissociation of accommodation and convergence.

关键词： Holography

来源：评论

学校读者我要写书评

暂无评论

Mitosis detection and classification for breast cancer diagnosis: What we know and what is next

引用

Computers in Biology and Medicine 2025年 191卷 110057-110057页

作者： Khalil, Rafi Ullah Sajjad, Muhammad Dhahbi, Sami Bourouis, Sami Hijji, Mohammad Muhammad, Khan Digital Image Processing Lab Department of Computer Science Islamia College Peshawar Peshawar25000 Pakistan Applied College of Mahail Aseer King Khalid University Muhayil Aseer62529 Saudi Arabia Department of Information Technology College of Computers and Information Technology Taif University Taif21944 Saudi Arabia Faculty of Computers and Information Technology University of Tabuk Tabuk71491 Saudi Arabia Department of Applied AI School of Convergence College of Computing and Informatics Sungkyunkwan University Seoul03063 Korea Republic of

Breast cancer is the second most deadly malignancy in women, behind lung cancer. Despite significant improvements in medical research, breast cancer is still accurately diagnosed with histological analysis. During this procedure, pathologists examine a physical sample for the presence of mitotic cells, or dividing cells. However, the high resolution of histopathology images and the difficulty of manually detecting tiny mitotic nuclei make it particularly challenging to differentiate mitotic cells from other types of cells. Numerous studies have addressed the detection and classification of mitosis, owing to increasing capacity and developments in automated approaches. The combination of machine learning and deep learning techniques has greatly revolutionized the process of identifying mitotic cells by offering automated, precise, and efficient solutions. In the last ten years, several pioneering methods have been presented, advancing towards practical applications in clinical settings. Unlike other forms of cancer, breast cancer and gliomas are categorized according to the number of mitotic divisions. Numerous papers have been published on techniques for identifying mitosis due to easy access to datasets and open competitions. Convolutional neural networks and other deep learning architectures can precisely identify mitotic cells, significantly decreasing the amount of labor that pathologists must perform. This article examines the techniques used over the past decade to identify and classify mitotic cells in histologically stained breast cancer hematoxylin and eosin images. Furthermore, we examine the benefits of current research techniques and predict forthcoming developments in the investigation of breast cancer mitosis, specifically highlighting machine learning and deep learning. © 2025 Elsevier Ltd

关键词： Lung cancer

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：