PurposeUsers often struggle to select choosing among similar online services. To help them make informed decisions, it is important to establish a service reputation measurement mechanism. User-provided feedback ratin...
详细信息
PurposeUsers often struggle to select choosing among similar online services. To help them make informed decisions, it is important to establish a service reputation measurement mechanism. User-provided feedback ratings serve as a primary source of information for this mechanism, and ensuring the credibility of user feedback is crucial for a reliable reputation measurement. Most of the previous studies use passive detection to identify false feedback without creating incentives for honest reporting. Therefore, this study aims to develop a reputation measure for online services that can provide incentives for users to report ***/methodology/approachIn this paper, the authors present a method that uses a peer prediction mechanism to evaluate user credibility, which evaluates users' credibility with their reports by applying the strictly proper scoring rule. Considering the heterogeneity among users, the authors measure user similarity, identify similar users as peers to assess credibility and calculate service reputation using an improved expectation-maximization algorithm based on user *** analysis and experimental results verify that the proposed method motivates truthful reporting, effectively identifies malicious users and achieves high service rating ***/valueThe proposed method has significant practical value in evaluating the authenticity of user feedback and promoting honest reporting.
The semi-competing risks data model is a special type of disease-state model that focuses on studying the association between an intermediate event and a terminal event and proves to be a useful tool in modeling disea...
详细信息
The semi-competing risks data model is a special type of disease-state model that focuses on studying the association between an intermediate event and a terminal event and proves to be a useful tool in modeling disease progression. The study of the semi-competing risk data model not only allows us to evaluate whether a disease episode is related to death but also provides a toolkit to predict death, given that the episode occurred at a certain time. However, the computation of the semi-competing risk models is a numerically challenging task. The Gamma-Frailty conditional Markov model has been shown to be an efficient computation model for studying semi-competing risks data. Building on recent advances in studying semi-competing risks data, this work proposes a non-parametric pseudo-likelihood method equipped with an em-like algorithm to study semi-competing risks data with event misascertainment under the restricted Gamma-Frailty conditional Markov model. A thorough simulation study is conducted to demonstrate the inference validity of the proposed method and its numerical stability. The proposed method is applied to a large HIV cohort study, EA-IeDEA, that has a severe death under-reporting issue to assess the degree of adverse impact of the interruption of ART care on HIV mortality.
Tensors have become prevalent in business applications and scientific studies. It is of great interest to analyze and understand the heterogeneity in tensor-variate observations. We propose a novel tensor low-rank mix...
详细信息
Tensors have become prevalent in business applications and scientific studies. It is of great interest to analyze and understand the heterogeneity in tensor-variate observations. We propose a novel tensor low-rank mixture model (TLMM) to conduct efficient estimation and clustering on tensors. The model combines the Tucker low-rank structure in mean contrasts and the separable covariance structure to achieve parsimonious and interpretable modeling. To implement efficient computation under this model, we develop a low-rank enhanced expectation-maximization (LEem) algorithm. The pseudo E-step and the pseudo M-step are carefully designed to incorporate variable selection and efficient parameter estimation. Numerical results in extensive experiments demonstrate the encouraging performance of the proposed method compared to popular vector and tensor methods. for this article are available online.
Networks consist of interconnected units, known as nodes, and allow to formally describe interactions within a system. Specifically, bipartite networks depict relationships between two distinct sets of nodes, designat...
详细信息
Networks consist of interconnected units, known as nodes, and allow to formally describe interactions within a system. Specifically, bipartite networks depict relationships between two distinct sets of nodes, designated as sending and receiving nodes. An integral aspect of bipartite network analysis often involves identifying clusters of nodes with similar behaviors. The computational complexity of models for large bipartite networks poses a challenge. To mitigate this challenge, we employ a Mixture of Latent Trait Analyzers (MLTA) for node clustering. Our approach extends the MLTA to include covariates and introduces a double em algorithm for estimation. Applying our method to COVID-19 data, with sending nodes representing patients and receiving nodes representing preventive measures, enables dimensionality reduction and the identification of meaningful groups. We present simulation results demonstrating the accuracy of the proposed method.
To analyze the singly-truncated bivariate economic data, we establish a class of singly-truncated bivariate normal distributions via stochastically representing the original bivariate normal random vector as a mixture...
详细信息
To analyze the singly-truncated bivariate economic data, we establish a class of singly-truncated bivariate normal distributions via stochastically representing the original bivariate normal random vector as a mixture of the singly-truncated part and its complementary components. Aided with the stochastic representaion, we creatively construct two novel unified and simple algorithms-the expectation-maximization algorithm as well as the minorization-maximization algorithm-to calculate the maximum likelihood estimates of the means and covariance matrix for the model of interest. In addition, we also develop a DA algorithm for posterior sampling in Bayesian analysis. Both simulation results and two real data applications in economics, collaborated by comparisons with existing methods, demonstrate the effectiveness and stability of proposed methodologies.
The detection of anomalies in the daily behaviour of a monitored person is of crucial interest to discover degenerative diseases, medication changes or any important problem in the health of the monitored person. In t...
详细信息
The detection of anomalies in the daily behaviour of a monitored person is of crucial interest to discover degenerative diseases, medication changes or any important problem in the health of the monitored person. In this work we focus on the detection of anomalies in the transit between the different rooms of his/her house. For this purpose, we propose to model the transit times using a mixture of von Mises distributions and estimate the parameters using the em algorithm. An extension of the CUSUM algorithm is proposed to detect changes. This extension is based on reformulating this algorithm as a hypothesis test on the model parameters, using the likelihood ratio. In order to verify the validity of the method, extensive experimentation has been performed.
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametri...
详细信息
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (em) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the em iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."
Existing methods can perform likelihood-based clustering on a multivariate data matrix of ordinal data, using finite mixtures to cluster the rows (observations) of the matrix. These models can incorporate the main eff...
详细信息
Existing methods can perform likelihood-based clustering on a multivariate data matrix of ordinal data, using finite mixtures to cluster the rows (observations) of the matrix. These models can incorporate the main effects of individual rows and columns, as well as cluster effects, to model the matrix of responses. However, many real-world applications also include available covariates, which provide insights into the main characteristics of the clusters and determine clustering structures based on both the individuals' similar patterns of responses and the effects of the covariates on the individuals' responses. In our research we have extended the mixture-based models to include covariates and test what effect this has on the resulting clustering structures. We focus on clustering the rows of the data matrix, using the proportional odds cumulative logit model for ordinal data. We fit the models using the Expectation-Maximization algorithm and assess performance using a simulation study. We also illustrate an application of the models to the well-known arthritis clinical trial data set.
We introduce a multivariate version of the modified skew-normal distribution, which contains the multivariate normal distribution as a special case. Unlike the Azzalini multivariate skew-normal distribution, this new ...
详细信息
We introduce a multivariate version of the modified skew-normal distribution, which contains the multivariate normal distribution as a special case. Unlike the Azzalini multivariate skew-normal distribution, this new distribution has a nonsingular Fisher information matrix when the skewness parameters are all zero, and its profile log-likelihood of the skewness parameters is always a non-monotonic function. We study some basic properties of the proposed family of distributions and present an expectation-maximization (em) algorithm for parameter estimation that we validate through simulation studies. Finally, we apply the proposed model to the univariate frontier data and to a trivariate wind speed data, and compare its performance with the Azzalini skew-normal model.
We consider interval censored data with a cured subgroup that arises from longitudinal followup studies with a heterogeneous population where a certain proportion of subjects is not susceptible to the event of interes...
详细信息
We consider interval censored data with a cured subgroup that arises from longitudinal followup studies with a heterogeneous population where a certain proportion of subjects is not susceptible to the event of interest. We propose a two component mixture cure model, where the first component describing the probability of cure is modeled by a support vector machine-based approach and the second component describing the survival distribution of the uncured group is modeled by a proportional hazard structure. Our proposed model provides flexibility in capturing complex effects of covariates on the probability of cure unlike the traditional models that rely on modeling the cure probability using a generalized linear model with a known link function. For the estimation of model parameters, we develop an expectation maximization-based estimation algorithm. We conduct simulation studies and show that our proposed model performs better in capturing complex effects of covariates on the cure probability when compared to the traditional logit link-based two component mixture cure model. This results in more accurate (smaller bias) and precise (smaller mean square error) estimates of the cure probabilities, which in-turn improves the predictive accuracy of the latent cured status. We further show that our model's ability to capture complex covariate effects also improves the estimation results corresponding to the survival distribution of the uncured. Finally, we apply the proposed model and estimation procedure to an interval censored data on smoking cessation.
暂无评论