Canonical polyadic decomposition (CPD) has been a workhorse for multimodal data analytics. This work puts forth a stochastic algorithmic framework for CPD under β-divergence, which is well-motivated in statistical le...
详细信息
Canonical polyadic decomposition (CPD) has been a workhorse for multimodal data analytics. This work puts forth a stochastic algorithmic framework for CPD under β-divergence, which is well-motivated in statistical learning-where the Euclidean distance is typically not preferred. Despite the existence of a series of prior works addressing this topic, pressing computational and theoretical challenges, e.g., scalability and convergence issues, still remain. In this paper, a unified stochastic mirror descent framework is developed for large-scale β-divergence CPD. Our key contribution is the integrated design of a tensor fiber sampling strategy and a flexible stochastic Bregman divergence-based mirror descent iterative procedure, which significantly reduces the computation and memory cost per iteration for various β. Leveraging the fiber sampling scheme and the multilinear algebraic structure of low-rank tensors, the proposed lightweight algorithm also ensures global convergence to a stationary point under mild conditions. Numerical results on synthetic and real data show that our framework attains significant computational saving compared with state-of-the-art methods.
This study investigates the statistical characteristics of liquid film thickness fluctuations during countercurrent flow limitation on a complex geometry containing a horizontal and inclined pipe which are connected b...
This study investigates the statistical characteristics of liquid film thickness fluctuations during countercurrent flow limitation on a complex geometry containing a horizontal and inclined pipe which are connected by an elbow. The fluctuation data are obtained by imageprocessing technique which was applied towards high-quality images captured by a high-speed video recording. The result indicates that the thickening of the liquid triggers the liquid blockage that initiates flooding at the upper edge of the hydraulic jump. Moreover, the stochastic analysis reveals that the obtained data imply consistence trends across whole water discharge. Furthermore, it also shows a good agreement to the visual observations. Meanwhile, the wavelet analysis can distinguish energy fluctuations under different water discharges.
Themain aim of this research is to filter Skin Cancer MRI images based on the imageprocessing technologies using novel median filter algorithm and is compared with Gabor filter algorithm. This research contains two g...
详细信息
ISBN:
(纸本)9781665487351
Themain aim of this research is to filter Skin Cancer MRI images based on the imageprocessing technologies using novel median filter algorithm and is compared with Gabor filter algorithm. This research contains two groups, each with a sample size of 20 with Gpower of 80 percent. The performance of the novel median filter is evaluated and the performance measurements such as PSNR (Peak Signal to Noise Ratio) and MSE (Mean Square Error) are compared with the Gabor filter. According to the data obtained by simulating with matlab, the novel median filter's PSNR is 31.64, and its MSE is 9.094, whereas the Gabor filter's PSNR (Peak Signal to Noise Ratio) is 27.02, and its MSE (Mean Square Error) is 11.52. From the statistical analysis, it is observed that the significant value of PSNR (Peak Signal to Noise Ratio) (0.409) and $\mathbf{p} > 0.05$ and value of MSE (0.010) of the algorithm is $\mathbf{p} < 0.05$ . In this study, it is found that the novel Median filter performs better than the Gabor filter in terms of PSNR and MSE.
Leaf image recognition is a fine-grained image classification task in the computer vision domain that is fundamental yet challenging both in biology and computer vision research communities. With the advent of the Con...
详细信息
ISBN:
(纸本)9781665446006
Leaf image recognition is a fine-grained image classification task in the computer vision domain that is fundamental yet challenging both in biology and computer vision research communities. With the advent of the Convolutional Neural Network (CNN), encouraging accuracy has been achieved on leaf image recognition. However, the current CNN-based methods mainly capture the leaf properties of geometric and statistical distribution aspects but ignore the topological features. In this paper, we integrated the topological descriptor Persistence image (PI) with the classic CNN model VGG16 to improve the accuracy of leaf image classification. A pretrained model Pi-net was adopted as the PI extractors, and an attention module was introduced to fuse the topological features and the features learned by CNN. Experiments on species dataset Flavia, Swedish, and Folio, and a more fine-grained cultivar dataset Cherry demonstrated the effectiveness of the proposed method. Our method achieved an accuracy of 99.85% on the Flavia dataset, 99.92% on the Swedish dataset, 99.64% on the Folio dataset, and 68.95% on the Cherry image dataset. Besides, the huge accuracy improvement on the Cherry dataset also demonstrated the advantage of our method in fine-grained leaf image classification.
Alzheimer’s disease (AD) is the most common type of dementia. In all leading countries, it is one of the primary reasons of death in senior citizens. Currently, it is diagnosed by calculating the MSME score and by th...
详细信息
Deep learning is developed as a learning process from source inputs to target outputs where the inference or optimization is performed over an assumed deterministic model with deep structure. A wide range of temporal ...
详细信息
ISBN:
(纸本)9781450368599
Deep learning is developed as a learning process from source inputs to target outputs where the inference or optimization is performed over an assumed deterministic model with deep structure. A wide range of temporal and spatial data in language and vision are treated as the inputs or outputs to build such a complicated mapping in different information systems. A systematic and elaborate transfer is required to meet the mapping between source and target domains. Also, the semantic structure in natural language and computer vision may not be well represented or trained in mathematical logic or computer programs. The distribution function in discrete or continuous latent variable model for words, sentences, images or videos may not be properly decomposed or estimated. The system robustness to heterogeneous environments may not be assured. This tutorial addresses the fundamentals and advances in statistical models and neural networks, and presents a series of deep Bayesian solutions including variational Bayes, sampling method, Bayesian neural network, variational auto-encoder (VAE), stochastic recurrent neural network, sequence-to-sequence model, attention mechanism, end-to-end network, stochastic temporal convolutional network, temporal difference VAE, normalizing flow and neural ordinary differential equation. Enhancing the prior/posterior representation is addressed in different latent variable models. We illustrate how these models are connected and why they work for a variety of applications on complex patterns in language and vision. The word, sentence and image embeddings are merged with semantic constraint or structural information. Bayesian learning is formulated in the optimization procedure where the posterior collapse is tackled. An informative latent space is trained to incorporate deep Bayesian learning in various information systems.
The paper describes the new approach to developing imageprocessingmethods for medical video systems. The core idea is the personalized workflow of video data processing and analysis. It allows to increase the sensit...
详细信息
Measuring similarity between two image sets is instrumental in many computer vision tasks, such as video face recognition, multi-shot person re-identification and gait recognition. In most of the recent works, it is d...
详细信息
Measuring similarity between two image sets is instrumental in many computer vision tasks, such as video face recognition, multi-shot person re-identification and gait recognition. In most of the recent works, it is done by aggregating the embedding features of images as a fixed size vector, and calculating a metric in vector space (i.e. Euclidean distance). The embedding feature function can be learned by deep metric learning (DML) technique. However, methods relying on feature aggregation fail to capture the diversity and uncertainty within image sets. In this paper, we obviate the need of feature aggregation and propose a novel statistical Distance Metric Learning (SDML) framework, which represents each image set as a probability distribution in embedding feature space and compares two image sets by statistical distance between their distributions. Among all types of statistical distance, we choose Jeffrey's divergence (JD), which can be obtained from two embedding feature sets by kNN based density estimator. We also design a statistical centroid loss function to enhance the discriminative power of training process. Our SDML framework naturally preserves the diversity within an image set, and the relation between two sets. We evaluate our proposed approach on gait recognition and multi-shot person re-id. The experiment results show that SDML outperforms conventional DML, and also receives competitive/superior performance comparing to the previous state-of-the-arts on the aforementioned tasks.
As one of the key technologies for the sixth generation (6G) mobile communications, intelligent reflecting surface (IRS) has the advantages of low power consumption, low cost, and simple design methods. But channel mo...
详细信息
ISBN:
(纸本)9781728172361
As one of the key technologies for the sixth generation (6G) mobile communications, intelligent reflecting surface (IRS) has the advantages of low power consumption, low cost, and simple design methods. But channel modeling is still an open issue in this field currently. In this paper, we propose a three-dimensional (3D) geometry based stochastic model (GBSM) for a massive multiple-input multiple-output (MIMO) communication system employing IRS. The model supports the movements of the transmitter, the receiver, and clusters. The evolution of clusters on the linear array and planar array is also considered in the proposed model. In addition, the generation of reflecting coefficient is incorporated into the model and the path loss of the sub-channel assisted by IRS is also proposed. The steering vector is set up at the base station for the cooperation with IRS. Through studying statistical properties such as the temporal autocorrelation function and space correlation function, the non-stationary properties are verified. The good agreement between the simulation results and the analytical results illustrates the correctness of the proposed channel model.
To increase the robustness of Acoustic Scene Clas-sification (ASC) during foreground speech presence, we recently proposed a noise-floor based iVector framework exploiting the statistical estimate of the background si...
详细信息
ISBN:
(数字)9789082797091
ISBN:
(纸本)9781665467995
To increase the robustness of Acoustic Scene Clas-sification (ASC) during foreground speech presence, we recently proposed a noise-floor based iVector framework exploiting the statistical estimate of the background signal spectrum. Thereby, ASC accuracy was greatly improved when foreground speech was predominant, at the cost of poorer performance in scenarios with low foreground speech levels. A soft Voice Activity Detector (softVAD) is introduced, here, to improve this trade-off. Three possibilities are investigated: (a) a segment-wise, weighted score fusion system, yielding a sof VAD-based weighted average of the output scores of the (classical) iVector framework and those of the noise-floor based iVector framework; (b) the introduction of weighted Baum-Welch statistics in the iVector extraction stage, with weights that emphasize the background-dominant frames and disregard speech-dominant frames in the test sequence. Based on the performance of these alternatives, a third approach (approach (c)) that performs segment-level score fusion of the frame-wise weighted statistics (approach (b)) and the noise-floor system is proposed. Experiments conclusively demonstrate that all proposals significantly improve the classification accuracy. Especially the last approach outperforms all other methods in a wide range of experimental conditions.
暂无评论