Near-infrared (NIR) image colorization is an image enhancement method that improves the readability of a nearinfrared image and enhances its semantic information. For the problems of color distortion, semantic ambigui...
详细信息
Near-infrared (NIR) image colorization is an image enhancement method that improves the readability of a nearinfrared image and enhances its semantic information. For the problems of color distortion, semantic ambiguity, and unclear texture shape in current near-infrared image colorization techniques, we propose a novel, to the best of our knowledge, near-infrared image colorization approach based on the generative adversarial network (GAN). This method carries out an optimization design from the generator and discriminator of the GAN and designs a proper loss function based on the new network architecture. In terms of the generator, we design and integrate a Res-WTConv-U-Net network within the generator. Additionally, we design a deep bottleneck block, which is composed of a residual block and an efficient attention module (ECA), to replace the bottleneck layer in the U-Net. Moreover, we replace the traditional convolution with the wavelet convolution (WTconv) to achieve more effective feature extraction and better performance. In the aspect of the discriminator, we design a dual-scale discriminator by combining two discriminators of different receptive fields, which can take into account the global structure and local details. After training the model using the same dataset, we conduct a comparative experiment against typical image colorization methods. In this experiment, structural similarity (SSIM), peak signal-to-noise ratio (PSNR), and color histogram similarity (CHS) are employed as image evaluation indices. By comparing the colorization results in two different datasets, it is concluded that the PSNR has improved by 12.6% on average, the SSIM has improved by 7.4% on average, and the CHS has improved by 9.5% on average. Experimental results show that the colorization effect of this method is significantly better than other methods. (c) 2025 Optica Publishing Group. All rights, including for text and data mining (TDM), Artificial Intelligence (AI) training, a
Transformer model has received extensive attention in recent years. Its powerful ability to handle contextual relationships makes it outstanding in the accurate segmentation of medical structures such as organs and le...
详细信息
Transformer model has received extensive attention in recent years. Its powerful ability to handle contextual relationships makes it outstanding in the accurate segmentation of medical structures such as organs and lesions. However, as the Transformer model becomes more complex, its computational overhead has also increased significantly, becoming one of the key factors limiting the performance improvement of the model. In addition, some existing methods use channel dimensionality reduction to model cross-channel relationships. Although this strategy effectively reduces the amount of computation, it may lead to information loss or poor performance in segmentation tasks on medical images with rich details. To address the above problems, we propose an innovative medical image segmentation model, PCMA Former. This model combines convolution with focused weight reparameterization and a channel multi-branch attention mechanism, aiming to effectively improve model performance while maintaining low computational overhead. Through experimental verification on multiple medical image datasets (such as Synapse, ISIC2017, and ISIC2018), PCMA Former has achieved better results than traditional convolutional neural networks and existing Transformer models.
In today's digital world, the vast volume of data generated, often referred to as big data, presents both challenges and opportunities. One significant challenge is the risk of fraud in electronic cash transaction...
详细信息
In today's digital world, the vast volume of data generated, often referred to as big data, presents both challenges and opportunities. One significant challenge is the risk of fraud in electronic cash transactions. This study examines and compares 20 common online fraud detection methods within the context of big data, evaluating them based on 11 criteria: type of learning, speed, accuracy, cost (time), complexity, interpretability, scalability, robustness, flexibility, and temporal and spatial complexity. The evaluation highlights the performance of each method against various types of online cash fraud, including identity theft, card skimming, phishing, malware, money laundering, account takeover, refund fraud, and friendly fraud. Performance scores, derived from real-world data and simulations, indicate the effectiveness of each method in identifying and countering fraud in a big data environment. Our findings show that deep learning methods and artificial neural networks outperform other methods in most fraud scenarios, while general rule-based and inferential methods are less effective. This research provides valuable insights for financial institutions, e-commerce platforms, and other online services to enhance their fraud detection capabilities and protect sensitive customer data in the era of big data.
Accurate classification and identification of vessels in remote sensing satellite imagery is critical for ocean monitoring and resource management. The ability to extract information from remote-sensing data is of par...
详细信息
ISBN:
(纸本)9798350350920
Accurate classification and identification of vessels in remote sensing satellite imagery is critical for ocean monitoring and resource management. The ability to extract information from remote-sensing data is of paramount importance. To exploit the non-stationary characteristics of synthetic aperture radar (SAR) target, a comprehensive SAR ship recognition framework is designed by combing the second-order synchrosqueezing transform (SST), an effective non-stationary signalprocessing tool, with the histogram of oriented gradient (HOG) feature in this paper. Firstly, the second-order SST is performed on SAR images to describe the non-stationary characteristics of ships at different times and frequencies. Secondly, HOG features are utilized to effectively extract the non-stationary information of SAR ships and provide more discriminative input for the deep learning network. Then, the optimal ResNet model is selected as the convolutional neural network (CNN) classifier to automatically fuse the non-stationary features and abstract features of SAR ships. Experiments on two open SAR ship datasets (OpenSARShip and FUSAR-Ship) show that the proposed method achieves accurate classification and outperforms the state-of-the-art (SOTA) CNN-based methods in terms of robustness and generalization ability. The positive effect of non-stationary characteristics on SAR ship classification is verified.
Vector quantization (VQ) methods have been used in a wide range of applications for speech, image, and video data. While classic VQ methods often use expectation maximization, in this paper, we investigate the use of ...
详细信息
Emotion recognition plays a crucial role in cognitive science and human-computer interaction. Existing techniques tend to ignore the significant differences between different subjects, resulting in limited accuracy an...
详细信息
Emotion recognition plays a crucial role in cognitive science and human-computer interaction. Existing techniques tend to ignore the significant differences between different subjects, resulting in limited accuracy and generalization ability. In addition, existing methods suffer from difficulties in capturing the complex relationships among the channels of electroencephalography signals. A hybrid network is proposed to overcome the limitations. The proposed network is comprised of a deep adaptive multi-head attention (DAM) branch and a dynamic graph convolution (DGC) branch. The DAM branch uses residual convolution and adaptive multi-head attention mechanism. It can focus on multi-dimensional information from different representational subspaces at different locations. The DGC branch uses a dynamic graph convolutional neural network that learns topological features among the channels. The synergistic effect of these two branches enhances the model's adaptability to subject differences. The extraction of local features and the understanding of global patterns are also optimized in the proposed network. Subject independent experiments were conducted on SEED and SEED-IV datasets. The average accuracy of SEED was 92.63% and the average F1-score was 92.43%. The average accuracy of SEED-IV was 85.03%, and the average F1-score was 85.01%. The results show that the proposed network has significant advantages in cross-subject emotion recognition, and can improve the accuracy and generalization ability in emotion recognition tasks.
Facial analysis evaluates the physical appearances of person, which is crucial for several clinical settings. The perspective of real-world faces captured in an uncontrolled environment makes it harder for the gender ...
详细信息
Facial analysis evaluates the physical appearances of person, which is crucial for several clinical settings. The perspective of real-world faces captured in an uncontrolled environment makes it harder for the gender prediction algorithm to correctly identify gender. The accuracy of the most advanced algorithms currently in use for real-time facial gender prediction is decreased by these factors. Most importantly the facial gender prediction can pave a way for the visually challenged persons to identify the gender and age. A dual shot face detector with task restricted Fine-tuned deep neural network (DTFN) is created to recognise the facial land markings for accurate gender and age prediction in order to overcome the challenges and defects. Bidirectional filtering and sigmoid stretching are the main preprocessingmethods used to improve contrast and remove noise from the input image once facial photographs are first gathered. Next, employing the modified dual shot face detector (DSFD) to separate the face from the remaining background image. To solve this problem, DSFD is built around caps net. A task constrained deep convolutional neural network (TCDCN) is then used to extract and identify features from facial landmarks. The collected features are fed into a fine tuned deep neural network (DNN) classifier, which further classifies the data according to age and gender. By adjusting the hidden layer's parameter using the stochastic gradient descent technique, fine tweaking is achieved. According to the results of experimental research the proposed technique achieves 96 % Thus, the proposed approach is the best option for automatic facial land mark detection.
Sample-efficient neural architecture search (NAS) techniques have advanced rapidly. Two lines of methods, namely neural predictor and sequential search, have shown promising performance in improving the sample efficie...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Sample-efficient neural architecture search (NAS) techniques have advanced rapidly. Two lines of methods, namely neural predictor and sequential search, have shown promising performance in improving the sample efficiency of NAS. However, as far as we know, little attention has been paid to the middle ground between these two lines. Inspired by the analogy between NAS and evolutionary optimization, we propose a new Sample-Efficient Training for NAS (SET-NAS) based on strategies that improve fitness scores and sampling mechanisms. We develop a strong neural predictor called the Fully Bidirectional Graph Convolutional Network (Fully-BiGCN) that significantly enhances the predictor capability of the features in each layer. The developed predictor is embedded into an iterative stratified sampling process to retain only a subset of best-fit architectures using the same training budget. SET-NAS achieves remarkable results compared to the state-of-the-art in predictor-based NAS. Using NASBench-201 as the benchmark, SET-NAS takes only 27.1% (CIFAR-10), 49.0% (CIFAR-100), and 51.75% (imageNet-16) of training cost of other state-of-the-art predictor-based methods to find the promising network architecture.
Video captioning aims to identify multiple objects and their behaviours in a video event and generate captions for the current scene. This task aims to generate a detailed description of the current video in real-time...
详细信息
Video captioning aims to identify multiple objects and their behaviours in a video event and generate captions for the current scene. This task aims to generate a detailed description of the current video in real-time using natural language, which requires deep learning to analyze and determine the relationships between interesting objects in the frame sequence. In practice, existing methods typically involve detecting objects in the frame sequence and then generating captions based on features extracted through object coverage locations. Therefore, the results of caption generation are highly dependent on the performance of object detection and identification. This work proposes an advanced video captioning approach that works in adaptively and effectively addresses the interdependence between event proposals and captions. Additionally, an attention-based multimodel framework is introduced to capture the main context from the frame and sound in the video scene. Also, an intermediate model is presented to collect the hidden states captured from the input sequence, which performs to extract the main features and implicitly produce multiple event proposals. For caption prediction, the proposed method employs the CARU layer with attention consideration as the primary RNN layer for decoding. Experimental results showed that the proposed work achieves improvements compared to the baseline method and also better performance compared to other state-of-the-art models on the ActivityNet dataset, presenting competitive results in the tasks of video captioning. An advanced video captioning approach is proposed that works in adaptively and effectively addresses the interdependence between event proposals and captions. Additionally, an attention-based multimodel framework is introduced to capture the main context from the frame and sound in the video scene. image
Noisy images are a challenge to image compression algorithms due to the inherent difficulty of compressing noise. As noise cannot easily be discerned from image details, such as high-frequency signals, its presence le...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Noisy images are a challenge to image compression algorithms due to the inherent difficulty of compressing noise. As noise cannot easily be discerned from image details, such as high-frequency signals, its presence leads to extra bits needed for compression. Since the emerging learned image compression paradigm enables end-to- end optimization of codecs, recent efforts were made to integrate denoising into the compression model, relying on clean image features to guide denoising. However, these methods exhibit suboptimal performance under high noise levels, lacking the capability to generalize across diverse noise types. In this paper, we propose a novel method integrating a multi-scale denoiser comprising of Self Organizing Operational neural Networks, for joint image compression and denoising. We employ contrastive learning to boost the network ability to differentiate noise from high frequency signal components, by emphasizing the correlation between noisy and clean counterparts. Experimental results demonstrate the effectiveness of the proposed method both in rate-distortion performance, and codec speed, outperforming the current state-of-the-art.
暂无评论