To address the issue of a large placebo effect in certain therapeutic areas, rather than the application of the traditional gold standard parallel group placebo-controlled design, different versions of the sequential ...
详细信息
To address the issue of a large placebo effect in certain therapeutic areas, rather than the application of the traditional gold standard parallel group placebo-controlled design, different versions of the sequential parallel comparison design have been advocated. In general, the design consists of two consecutive stages and three treatment groups. Stage 1 placebo nonresponders potentially form a prespecified patient subgroup for formal between-treatment comparison at the final analysis. In this research, a version of the design is considered for a binary endpoint. To fully utilize all available data, a generalized weighted combination test is proposed in case placebo has a relatively small effect for some of the study endpoints. The weighted combination of the test based on stage 1 data and the test based on stage 2 data of stage 1 placebo nonresponders suggested in the literature uses only a part of the study data and is a special case of this generalized weighted combination test. A multiple imputation approach is outlined for handling missing not at random data. Simulation is conducted to evaluate the performances of the methods and a data example is employed to illustrate the applications of the methods.
Interpolating a skewed conditional spatial random field with missing data is cumbersome in the absence of Gaussianity assumptions. Copulas can capture different types of joint tail characteristics beyond the Gaussian ...
详细信息
Interpolating a skewed conditional spatial random field with missing data is cumbersome in the absence of Gaussianity assumptions. Copulas can capture different types of joint tail characteristics beyond the Gaussian paradigm. Maintaining spatial homogeneity and continuity around the observed random spatial point is also challenging. Especially when interpolating along a spatial surface, the boundary points also demand focus in forming a neighborhood. As a result, importing the concept of hierarchical clustering on the spatial random field is necessary for developing the copula model with the interface of the expectation-maximization algorithm and concurrently utilizing the idea of the Bayesian framework. This article introduces a spatial cluster-based C-vine copula and a modified Gaussian distance kernel to derive a novel spatial probability distribution. To make spatial copula interpolation compatible and efficient, we estimate the parameter by employing different techniques. We apply the proposed spatial interpolation approach to the air pollution of Delhi as a crucial circumstantial study to demonstrate this newly developed novel spatial estimation technique.
To address the architecture complexity and ill-posed problems of neural networks when dealing with high-dimensional data, this article presents a Bayesian-learning-based sparse stochastic configuration network (SCN) (...
详细信息
To address the architecture complexity and ill-posed problems of neural networks when dealing with high-dimensional data, this article presents a Bayesian-learning-based sparse stochastic configuration network (SCN) (BSSCN). The BSSCN inherits the basic idea of training an SCN in the Bayesian framework but replaces the common Gaussian distribution with a Laplace one as the prior distribution of the output weights of SCN. Meanwhile, a lower bound of the Laplace sparse prior distribution using a two-level hierarchical prior is adopted based on which an approximate Gaussian posterior with sparse property is obtained. It leads to the facilitation of training the BSSCN, and the analytical solution for output weights of BSSCN can be obtained. Furthermore, the hyperparameter estimation process is derived by maximizing the corresponding lower bound of the marginal likelihood function based on the expectation-maximization algorithm. In addition, considering the uncertainties caused by both noises in the real-world data and model mismatch, a bootstrap ensemble strategy using BSSCN is designed to construct the prediction intervals (PIs) of the target variables. The experimental results on three benchmark data sets and two real-world high-dimensional data sets demonstrate the effectiveness of the proposed method in terms of both prediction accuracy and quality of the constructed PIs.
We propose a multivariate approach for the estimation of intergenerational transition matrices. Our methodology is grounded on the assumption that individuals' social status is unobservable and must be estimated. ...
详细信息
We propose a multivariate approach for the estimation of intergenerational transition matrices. Our methodology is grounded on the assumption that individuals' social status is unobservable and must be estimated. In this framework, parents and offspring are clustered on the basis of the observed levels of income and occupational categories, thus avoiding any discretionary rule in the definition of class boundaries. The resulting transition matrix is a function of the posterior probabilities of parents and young adults of belonging to each class. Estimation is carried out via maximum likelihood by means of an expectation-maximization algorithm. We illustrate the proposed method using National Longitudinal Survey Data from the United States in the period 1978-2006.
The expectation-maximization (EM) algorithm updates all of the parameter estimates simultaneously, which is not applicable to direction of arrival (DOA) estimation in unknown nonuniform noise. In this work, we present...
详细信息
The expectation-maximization (EM) algorithm updates all of the parameter estimates simultaneously, which is not applicable to direction of arrival (DOA) estimation in unknown nonuniform noise. In this work, we present several computationally efficient EM-type algorithms, which update the parameter estimates sequentially, for solving both the deterministic and stochastic maximum-likelihood (ML) direction finding problems in unknown nonuniform noise. Specifically, we design a generalized EM (GEM) algorithm and a space-alternating generalized EM (SAGE) algorithm for computing the deterministic ML estimator. Simulation results show that the SAGE algorithm outperforms the GEM algorithm. Moreover, we design two SAGE algorithms for computing the stochastic ML estimator, in which the first updates the DOA estimates simultaneously while the second updates the DOA estimates sequentially. Simulation results show that the second SAGE algorithm outperforms the first one. (c) 2023 Elsevier Inc. All rights reserved.
In the present study, we provide a motivating example with a financial application under COVID-19 pandemic to investigate autoregressive (AR) modeling and its diagnostics based on asymmetric distributions. The objecti...
详细信息
In the present study, we provide a motivating example with a financial application under COVID-19 pandemic to investigate autoregressive (AR) modeling and its diagnostics based on asymmetric distributions. The objectives of this work are: (i) to formulate asymmetric AR models and their estimation and diagnostics;(ii) to assess the performance of the parameters estimators and of the local influence technique for these models;and (iii) to provide a tool to show how data following an asymmetric distribution under an AR structure should be analyzed. We take the advantages of the stochastic representation of the skew-normal distribution to estimate the parameters of the corresponding AR model efficiently with the expectation-maximization algorithm. Diagnostic analytics are conducted by using the local influence technique with four perturbation schemes. By employing Monte Carlo simulations, we evaluate the statistical behavior of the corresponding estimators and of the local influence technique. An illustration with financial data updated until 2020, analyzed using the methodology introduced in the present work, is presented as an example of effective applications, from where it is possible to explain atypical cases from the COVID-19 pandemic.
In this article, we study a generalization of the two-groups model in the presence of covariates-a problem that has recently received much attention in the statistical literature due to its applicability in multiple h...
详细信息
In this article, we study a generalization of the two-groups model in the presence of covariates-a problem that has recently received much attention in the statistical literature due to its applicability in multiple hypotheses testing problems. The model we consider allows for infinite dimensional parameters and offers flexibility in modeling the dependence of the response on the covariates. We discuss the identifiability issues arising in this model and systematically study several estimation strategies. We propose a tuning parameter-free nonparametric maximum likelihood method, implementable via the expectation-maximization algorithm, to estimate the unknown parameters. Further, we derive the rate of convergence of the proposed estimators-in particular we show that the finite sample Hellinger risk for every 'approximate' nonparametric maximum likelihood estimator achieves a near-parametric rate (up to logarithmic multiplicative factors). In addition, we propose and theoretically study two 'marginal' methods that are more scalable and easily implementable. We demonstrate the efficacy of our procedures through extensive simulation studies and relevant data analyses-one arising from neuroscience and the other from astronomy. We also outline the application of our methods to multiple testing. The companion R package NPMLEmix implements all the procedures proposed in this article.
Item-level response time (RT) data can be conveniently collected from computer-based test/survey delivery platforms and have been demonstrated to bear a close relation to a miscellany of cognitive processes and test-t...
详细信息
Item-level response time (RT) data can be conveniently collected from computer-based test/survey delivery platforms and have been demonstrated to bear a close relation to a miscellany of cognitive processes and test-taking behaviors. Individual differences in general processing speed can be inferred from item-level RT data using factor analysis. Conventional linear normal factor models make strong parametric assumptions, which sacrifices modeling flexibility for interpretability, and thus are not ideal for describing complex associations between observed RT and the latent speed. In this paper, we propose a semiparametric factor model with minimal parametric assumptions. Specifically, we adopt a functional analysis of variance representation for the log conditional densities of the manifest variables, in which the main effect and interaction functions are approximated by cubic splines. Penalized maximum likelihood estimation of the spline coefficients can be performed by an expectation-maximization algorithm, and the penalty weight can be empirically determined by cross-validation. In a simulation study, we compare the semiparametric model with incorrectly and correctly specified parametric factor models with regard to the recovery of data generating mechanism. A real data example is also presented to demonstrate the advantages of the proposed method.
Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics make it feasible to examine...
详细信息
Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics make it feasible to examine the complex multicellular systems of different microenvironments. To answer scientific questions with spatial transcriptomics and expand our understanding of how cell types and states are regulated by microenvironment, the first step is to identify cell clusters by integrating the available spatial information. Here, we introduce SC-MEB, an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field. We have also derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB. In contrast to BayesSpace, a recently developed method, SC-MEB is not only computationally efficient and scalable to large sample sizes but is also capable of choosing the smoothness parameter and the number of clusters. We performed comprehensive simulation studies to demonstrate the superiority of SC-MEB over some existing methods. We applied SC-MEB to analyze the spatial transcriptome of human dorsolateral prefrontal cortex tissues and mouse hypothalamic preoptic region. Our analysis results showed that SC-MEB can achieve a similar or better clustering performance to BayesSpace, which uses the true number of clusters and a fixed smoothness parameter. Moreover, SC-MEB is scalable to large 'sample sizes'. We then employed SC-MEB to analyze a colon dataset from a patient with colorectal cancer (CRC) and COVID-19, and further performed differential expression analysis to identify signature genes related to the clustering results. The heatmap of identified signature genes showed that the clusters identified using SC-MEB were more separable than those obtained with BayesSpace. Using pathway analysis, we identified three immune-related clusters, and in a further comparison, found the mean expression of COVID-19
In epidemiological studies, it is easier to collect data only from individuals whose failure events are within a calendar time interval, the so-called interval sampling, which leads to doubly truncated data. In many s...
详细信息
In epidemiological studies, it is easier to collect data only from individuals whose failure events are within a calendar time interval, the so-called interval sampling, which leads to doubly truncated data. In many situations, the calendar time of the failure event can only be recorded within time intervals, leading to doubly truncated and interval censored (DTIC) data. Firstly, we point out that although the existing methods for DTIC data work adequately under the sampling scheme (Scheme 1) for doubly truncated data, Scheme 1 is not realistic for DTIC data. Secondly, we consider a commonly used sampling scheme (Scheme 2) , under which the individuals are included in the sample based on diagnosis date. We point out that under Scheme 2, due to violation of assumptions for Scheme 1, the NPMLE of the cumulative distribution function is severely biased if the likelihood function for Scheme 1 is used. To overcome this difficulty, we define a target population, under which a sampling scheme (Scheme 3) can be implemented such that appropriate truncation variables can be defined and the NPMLE of the cumulative distribution function can be obtained using the expectation-maximization algorithm. We also consider estimation of the joint distribution function for successive duration times. Using the imputed first failure times based on the NPMLE from Scheme 3, we then obtain the imputed right censored data of the second failure event. Based on the imputed data, we propose a nonparametric estimator of the joint distribution function using the inverse-probability-weighted approach. Simulation studies demonstrate that the proposed method performs well with moderate sample sizes.
暂无评论