Infrared spectroscopy analysis has found widespread applications in various fields due to advancements in technology and industry *** improve the quality and reliability of infrared spectroscopy signals,deconvolution ...
详细信息
Infrared spectroscopy analysis has found widespread applications in various fields due to advancements in technology and industry *** improve the quality and reliability of infrared spectroscopy signals,deconvolution is a crucial preprocessing *** by the transformer model,we propose an Auto-correlation Multi-head attention Transformer(AMTrans)for infrared spectrum sequence *** auto-correlation attention model improves the scaled dot-product attention in the *** utilizes attention mechanism for feature extraction and implements attention computation using the auto-correlation *** auto-correlation attention model is used to exploit the inherent sequence nature of spectral data and to effectively recovery spectra by capturing auto-correlation patterns in the *** proposed model is trained using supervised learning and demonstrates promising results in infrared spectroscopic *** comparing the experiments with other deconvolution techniques,the experimental results show that the method has excellent deconvolution performance and can effectively recover the texture details of the infrared spectrum.
作者:
Li, DingyiPCA Lab
Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education and Jiangsu Key Lab of Image and Video Understanding for Social Security School of Computer Science and Engineering Nanjing University of Science and Technology Nanjing210094 China
Perceptual video super-resolution aims at converting low-resolution videos to visually appealing high-resolution ones. It may lead to temporal inconsistency due to the drastically changing outputs. In this paper, we p...
详细信息
作者:
Wang, KunYan, ZhiqiangFan, JunkaiLi, JunYang, JianPCA Lab
Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education Jiangsu Key Lab of Image and Video Understanding for Social Security School of Computer Science and Engineering Nanjing University of Science and Technology Nanjing China
Depth completion endeavors to reconstruct a dense depth map from sparse depth measurements, leveraging the information provided by a corresponding color image. Existing approaches mostly hinge on single-scale propagat...
详细信息
Continued advances in self-supervised learning have led to significant progress in video representation learning, offering a scalable alternative to supervised approaches by eliminating the need for manual annotations...
详细信息
Continued advances in self-supervised learning have led to significant progress in video representation learning, offering a scalable alternative to supervised approaches by eliminating the need for manual annotations. Despite strong performance on standard action recognition benchmarks, existing video self-supervised learning methods are predominantly evaluated within narrow protocols—typically pre-training on Kinetics-400 and finetuning on similar datasets—limiting our understanding of their generalization capabilities in real-world settings. In this work, we present a comprehensive evaluation of modern video self-supervised learning models, focusing on generalization across four key downstream factors: domain shift, sample efficiency, action granularity, and task diversity. Building on our prior work analyzing benchmark sensitivity in CNN-based contrastive learning, we extend the study to cover current state-of-the-art transformer-based video-only and video-text representation models. Specifically, we benchmark 12 transformer-based methods (7 video-only, 5 video-text) and compare them against 10 CNN-based methods, resulting in over 1100 experiments across 8 datasets and 7 downstream tasks. Our analysis reveals that, despite architectural advancements, transformer-based models remain sensitive to downstream conditions. No single method generalizes consistently across all factors;for instance, video-only transformers are more robust to domain shift, CNN-based models perform better on tasks requiring fine-grained temporal reasoning, and video-text transformers underperform both in several downstream settings despite large-scale pretraining. We also observe that recent transformer-based approaches do not universally outperform earlier methods. These findings provide a detailed understanding of the capabilities and limitations of current video self-supervised learning approaches and establish an extended benchmark for evaluating generalization in video representation
Recently, Mix-style data augmentation methods (e.g., Mixup and CutMix) have shown promising performance in various visual tasks. However, these methods are primarily designed for single-label images, ignoring the cons...
详细信息
Shortcut learning, a phenomenon where deep neural networks inadvertently learn irrelevant features, has been extensively discussed due to its impact on model generalization and unexpected failures. Interpreting and di...
详细信息
Scanpath prediction for omnidirectional images aims to effectively simulate the human visual perception mechanism to generate dynamic realistic fixation trajectories. However, the majority of scanpath prediction metho...
详细信息
Scanpath prediction for omnidirectional images aims to effectively simulate the human visual perception mechanism to generate dynamic realistic fixation trajectories. However, the majority of scanpath prediction methods for omnidirectional images are still in their infancy as they fail to accurately capture the time-dependency of viewing behavior and suffer from sub-optimal performance along with limited generalization capability. A desirable solution should achieve a better trade-off between prediction performance and generalization ability. To this end, we propose a novel dual-temporal modulation scanpath prediction (ScanDTM) model for omnidirectional images. Such a model is designed to effectively capture long-range time-dependencies between various fixation regions across both internal and external time dimensions, thereby generating more realistic scanpaths. In particular, we design a Dual Graph Convolutional Network (Dual-GCN) module comprising a semantic-level GCN and an image-level GCN. This module servers as a robust visual encoder that captures spatial relationships among various object regions within an image and fully utilizes similar images as complementary information to capture similarity relations across relevant images. Notably, the proposed Dual-GCN focuses on modeling temporal correlations from both local and global perspectives within the internal time dimension. Furthermore, drawing inspiration from the promising generalization capabilities of diffusion models across various generative tasks, we introduce a novel diffusion-guided saliency module. This module formulates the prediction issue as a conditional generative process for the saliency map, utilizing extracted semantic-level and image-level visual features as conditions. With the well-designed diffusion-guided saliency module, our proposed ScanDTM model acting as an external temporal modulator, we can progressively refine the generated scanpath from the noisy map. We conduct extensive expe
In recent years, the study of 3D-display is rapid development and many researchers propose many methods. Holography is best methods. But, it is difficult that we developed holographic movie in the future tense. At the...
详细信息
In recent years, the study of 3D-display is rapid development and many researchers propose many methods. Holography is best methods. But, it is difficult that we developed holographic movie in the future tense. At the present time, stereogram method will make practicable in the near future. These methods can easily make animated 3D image. But this method has one problem; this method is conflict between convergence and accommodation. An observer can't watch 3D-display of this method long time. The authors will solve this problem. The authors proposed the 3D-display system that is used holography and stereogram technology. The proposed system has little conflict between convergence and accommodation. The authors developed this 3D-display system. The developed system has four focuses in horizontal direction. The display parts of developed system are LCD display because the developed system can play 3D movie. Of cause, this display doesn't have special glasses. But, color of this display is single color. It is red. The authors will develop full color 3D-display. The picture size of this display is about 6 inch and the form of this display is very large. The author will develop small size system and show large size picture.
We will discuss the characteristics of the Head Mounted Display(HMD) using Holographic Optical Element(HOE) in this paper. We have already proposed that using the HOE we could realize the see-through HMD, that is to s...
详细信息
We will discuss the characteristics of the Head Mounted Display(HMD) using Holographic Optical Element(HOE) in this paper. We have already proposed that using the HOE we could realize the see-through HMD, that is to say the binocular stereoscopic display. This time we evaluate the influence on the human vision system regarding the optical characteristics of the HOE. The HMD using HOE we proposed so far is the Maxwellian View which is the direct projection on the human retina. When we see something by Maxwellian View, we don't need the focusing of the crystalline lens (ocular accommodation) because the depth field is extremely wide. Therefore our binocular crystalline lens will focus at the vergence point when the Maxwellian View is used on the binocular retina. And we can solve the dissociation of accommodation and convergence which is the basic problem of the conventional HMD. We have made the prototype of HOE which can provide the Maxwellian View on our retina and we have proved that our HOE could separate the binocular images onto left and right eye. In this report, we will introduce that the Maxwellian View will change the ocular accommodation optionally according to the convergence when we see the real objects and the virtual objects at the same time. We proved that the HOE which provided the Maxwellian View could solve the dissociation of accommodation and convergence.
Breast cancer is the second most deadly malignancy in women, behind lung cancer. Despite significant improvements in medical research, breast cancer is still accurately diagnosed with histological analysis. During thi...
详细信息
暂无评论