We utilize the non-convex optimization method simulated annealing enriched with reversible jumps to enable a model selection capacity for deep learning models in a model size aware context. By using simulated annealin...
详细信息
We utilize the non-convex optimization method simulated annealing enriched with reversible jumps to enable a model selection capacity for deep learning models in a model size aware context. By using simulated annealing enriched with reversible jumps, we can yield a robust stochastic learning of the hidden posterior distribution of the structure, simultaneously constructing a more focused and certain estimate of the structure, all while making use of all the data. Being based upon Markov-chain learning methods, we constructed our priors to favor smaller and simpler architectures, allowing us to converge on the set of globally optimal models that are additionally parameter-efficient, seeking low parameter count deep models that retain good predictive accuracy. We demonstrate the capability on standard image recognition with CIFAR-10, as well as performing model selection on time-series tasks, realizing networks with competitive performance as compared to competing non-convex optimization methods such as genetic algorithms, random search, and Gaussian process based Bayesian optimization, while being less than half the size.
With the development of Deep Learning, Convolutional neural Networks (CNNs) have become a mainstream method for image classification, and the emergence of the ResNet architecture has significantly accelerated this pro...
详细信息
With the development of Deep Learning, Convolutional neural Networks (CNNs) have become a mainstream method for image classification, and the emergence of the ResNet architecture has significantly accelerated this process. However, as model depth increases, feature redundancy limits model performance. Although traditional machine learning methods like Principal Component Analysis (PCA) can effectively remove redundancy features, there is no effective method to integrate PCA as a feature extraction technique into different convolutional neural network architectures. This work proposes a Selective Principal Component Layer (SPCL), a feature extraction method that effectively incorporates PCA into convolutional neural networks to filter essential features and improve the feature representation ability of deep learning models. SPCL is applied to ResNet architecture models to reduce redundant features and enhance generalization performance in image classification tasks. Evaluations on CIFAR-10 and Tiny imageNet datasets demonstrate its effectiveness. The results show SPCL can be generally applied to ResNet architecture models and improve model accuracy, balancing the improvement of model performance and stability without adding significant computational overhead, demonstrating its potential to enhance performance in complex image classification tasks.
Medical image segmentation is a critical and complex process in medical imageprocessing and analysis. With the development of artificial intelligence, the application of deep learning in medical image segmentation is...
详细信息
Medical image segmentation is a critical and complex process in medical imageprocessing and analysis. With the development of artificial intelligence, the application of deep learning in medical image segmentation is becoming increasingly widespread. Existing techniques are mostly based on the U-shaped convolutional neural network and its variants, such as the U-Net framework, which uses skip connections or element-wise addition to fuse features from different levels in the decoder. However, these operations often weaken the compatibility between features at different levels, leading to a significant amount of redundant information and imprecise lesion segmentation. The construction of the loss function is a key factor in neural network design, but traditional loss functions lack high domain generalization and the interpretability of domain-invariant features needs improvement. To address these issues, we propose a Bayesian loss-based Multi-Scale Subtraction Attention Network (MSAByNet). Specifically, we propose an inter-layer and intra-layer multi-scale subtraction attention module, and different sizes of receptive fields were set for different levels of modules to avoid loss of feature map resolution and edge detail features. Additionally, we design a multi-scale deep spatial attention mechanism to learn spatial dimension information and enrich multi-scale differential information. Furthermore, we introduce Bayesian loss, re-modeling the image in spatial terms, enabling our MSAByNet to capture stable shapes, improving domain generalization performance. We have evaluated our proposed network on two publicly available datasets: the BUSI dataset and the Kvasir-SEG dataset. Experimental results demonstrate that the proposed MSAByNet outperforms several state-of-the-art segmentation methods. The codes are available at https://***/zlxokok/MSAByNet.
This paper introduces a novel family of exponential sampling type neural network Kantorovich operators, extending the work of Bajpeyi and Kumar (2021) and Bajpeyi (2023). Unlike previous research focused on approximat...
详细信息
This paper introduces a novel family of exponential sampling type neural network Kantorovich operators, extending the work of Bajpeyi and Kumar (2021) and Bajpeyi (2023). Unlike previous research focused on approximating continuous functions, our operators are designed to handle Lebesgue integrable functions, offering enhanced versatility. We establish convergence theorems, analyze asymptotic behavior, and demonstrate the effectiveness of linear combinations for improving convergence rates. Our analysis extends to the multivariate setting, highlighting the operators' capability in approximating a wide range of functions. To evaluate the practical performance of our proposed operators, we conducted numerical experiments with different sigmoidal functions and parameter values. Our findings reveal that operators activated by the Parametric sigmoid function consistently outperform those activated by other sigmoidal functions, achieving up to 20.70% reduction in maximum absolute error and 10.03% reduction in root mean squared errors. When applied to image scaling, our operators demonstrated superior performance compared to state-of-the-art methods like nearest neighbor, bilinear, and bicubic interpolation. For the 'Baboon' image, we observed up to 5.62% increase in Peak signal- to-Noise Ratio (PSNR) and 5.25% increase in Structural Similarity Index Measure (SSIM). Similar enhancements were observed for the 'Flowers' and 'Retina' images. The paper includes a detailed description of the imageprocessing algorithm, along with a flowchart illustrating the implementation. These results underscore the operators' potential in various machine learning tasks, motivating further research into their applications and optimization.
image segmentation is an essential initial stage in several computer vision applications. However, unsupervised image segmentation is still a challenging task in some cases such as when objects with a similar visual a...
详细信息
image segmentation is an essential initial stage in several computer vision applications. However, unsupervised image segmentation is still a challenging task in some cases such as when objects with a similar visual appearance overlap. Unlike 2D images, 4D Light Fields (LFs) convey both spatial and angular scene information facilitating depth/disparity estimation, which can be further used to guide the segmentation. Existing 4D LF segmentation methods that target object level (i.e., mid-level and high-level) segmentation are typically semi-supervised or supervised with ground truth labels and mostly support only densely sampled 4D LFs. This paper proposes a novel unsupervised mid-level 4D LF Segmentation method using Graph neural Networks (LFSGNN), which segments all LF views consistently. To achieve that, the 4D LF is represented as a hypergraph, whose hypernodes are obtained based on hyperpixel over-segmentation. Then, a graph neural network is used to extract deep features from the LF and assign segmentation labels to all hypernodes. Afterwards, the network parameters are updated iteratively to achieve better object separation using backpropagation. The proposed segmentation method supports both densely and sparsely sampled 4D LFs. Experimental results on synthetic and real 4D LF datasets show that the proposed method outperforms benchmark methods both in terms of segmentation spatial accuracy and angular consistency.
In recent years, the crucial task of image compression has been addressed by end-to-end neural network methods. However, achieving fine-grained rate control in this new paradigm has presented challenges. In our previo...
详细信息
In recent years, the crucial task of image compression has been addressed by end-to-end neural network methods. However, achieving fine-grained rate control in this new paradigm has presented challenges. In our previous work, we explored mismatches in rate estimation during target-rate-oriented training and proposed heuristics involving costly parameter searches as a solution. This work proposes a lightweight approach, which dynamically adapts loss parameters to mitigate rate estimation issues, ensuring precise target rate attainment. Inspired by Reinforcement Learning, our method exhibits performance comparable to preceding approaches on the Kodak dataset in terms of PSNR. Additionally, it reduces computational training costs.
Deep learning plays an important role in the field of machine learning, which has been developed and used in a wide range of areas. Many deep-learning-based methods have been proposed to improve image resolution, most...
详细信息
Deep learning plays an important role in the field of machine learning, which has been developed and used in a wide range of areas. Many deep-learning-based methods have been proposed to improve image resolution, most of which are based on image-to-image translation algorithms. The performance of neural networks used to achieve image translation always depends on the feature difference between input and output images. Therefore, these deep-learning-based methods sometimes do not have good performance when the feature differences between low-resolution and high-resolution images are too large. In this paper, we introduce a dual-step neural network algorithm to improve image resolution step by step. Compared with conventional deep-learning methods that use input and output images with huge differences for training, this algorithm learning from input and output images with fewer differences can improve the performance of neural networks. This method was used to reconstruct high-resolution images of fluorescence nanoparticles in cells. (c) 2023 Optica Publishing Group
Low-Dose Computed Tomography (LDCT) has gradually replaced Normal-Dose Computed Tomography (NDCT) due to its lower radiation exposure. However, the reduction in radiation dose has led to increased noise and artifacts ...
详细信息
Low-Dose Computed Tomography (LDCT) has gradually replaced Normal-Dose Computed Tomography (NDCT) due to its lower radiation exposure. However, the reduction in radiation dose has led to increased noise and artifacts in LDCT images. To date, many methods for LDCT denoising have emerged, but they often struggle to balance denoising performance with reconstruction efficiency. This paper presents a novel Momentum Context Diffusion model for low-dose CT denoising, termed MoCoDiff. First, MoCoDiff employs a Mean-Preserving stochastic Degradation (MPSD) operator to gradually degrade NDCT to LDCT, effectively simulating the physical process of CT degradation and greatly reducing sampling steps. Furthermore, the stochastic nature of the MPSD operator enhances the diversity of samples in the training space and calibrates the deviation between network inputs and time-step embedded features. Second, we propose a Momentum Context (MoCo) strategy. This strategy uses the most recent sampling result from each step to update the context information, thereby narrowing the noise level gap between the sampling results and the context data. This approach helps to better guide the next sampling step. Finally, to prevent issues such as over-smoothing of image edges that can arise from using the mean square error loss function, we develop a dual-domain loss function that operates in both the image and wavelet domains. This approach leverages wavelet domain information to encourage the model to preserve structural details in the images more effectively. Extensive experimental results show that our MoCoDiff model outperforms competing methods in both denoising and generalization performance, while also ensuring fast training and inference.
Adaptive Fourier decomposition (AFD) is a newly developed signalprocessing tool that can adaptively decompose any single signal using a Szego kernel dictionary. To process multiple signals, a novel stochastic-AFD (SA...
详细信息
Adaptive Fourier decomposition (AFD) is a newly developed signalprocessing tool that can adaptively decompose any single signal using a Szego kernel dictionary. To process multiple signals, a novel stochastic-AFD (SAFD) theory was recently proposed. The innovation of this study is twofold. First, a SAFD-based general multi-signal sparse representation learning algorithm is designed and implemented for the first time in the literature, which can be used in many signal and imageprocessing areas. Second, a novel SAFD based image compression framework is proposed. The algorithm design and implementation of the SAFD theory and image compression methods are presented in detail. The proposed compression methods are compared with 13 other state-of-the-art compression methods, including JPEG, JPEG2000, BPG, and other popular deep learning-based methods. The experimental results show that our methods achieve the best balanced performance. The proposed methods are based on single image adaptive sparse representation learning, and they require no pre-training. In addition, the decompression quality or compression efficiency can be easily adjusted by a single parameter, that is, the decomposition level. Our method is supported by a solid mathematical foundation, which has the potential to become a new core technology in image compression.
As deep neural network (DNN) models become more accurate, problems such as large model parameters and high computational complexity have become increasingly prominent, leading to a bottleneck in deploying them on reso...
详细信息
As deep neural network (DNN) models become more accurate, problems such as large model parameters and high computational complexity have become increasingly prominent, leading to a bottleneck in deploying them on resource-limited embedded platforms. In recent years, logarithm-based quantization techniques have shown great potential in reducing the inference cost of neural networks. However, current single-model log-quantization has reached an upper limit of classification performance, and little work has investigated hardware implementation of neural network quantization. In this paper, we propose a full logarithmic quantization (FLQ) mechanism that quantizes both weights and activation values into the logarithmic domain, compressing the parameters on AlexNet and VGG16 model by >6.4 times while maintaining an accuracy loss of within 2.5 % compared with benchmarking. Furthermore, we propose two optimization solutions for FLQ: activation segmented full logarithmic quantization (ASFLQ) and multi-ratio activation segmented full logarithmic quantization (Multi-ASFLQ), which can better balance the numerical representation range and quantization step. Under the condition of weight quantization of 5 bits and activation value quantization of 4 bits, the optimization methods proposed in this paper can improve the TOP1 of the VGG16 network model by 1 % and 1.6 %, respectively. Subsequently, we propose an implementation scheme of computing unit corresponding to the optimized FLQ mechanism above, which can not only convert multiplication operations into a shift operation but also integrate functions such as different ratio logarithmic bases and sparsity processing for activation, minimizing resource consumption as well as avoiding unnecessary calculations. Finally, we experiment with VGG19, Retnet50, and Densenet169 models, proving that the proposed method can achieve good performance under lower bit quantization. (c) 2001 Elsevier Science. All rights reserved
暂无评论