In general, the solution to a regression problem is the minimizer of a given loss criterion and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutio...
详细信息
In general, the solution to a regression problem is the minimizer of a given loss criterion and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutions can be found by solely specifying a functional. These solutions will then be minimizers under all loss functions simultaneously as long as the loss functions have the requested functional as the Bayes act. For the functional, the only requirement is that it can be defined via an identification function, with examples including the expectation, quantile, and expectile functionals. Generalizing classical results, we characterize the optimal solutions to the isotonic regression problem for identifiable functionals by rigorously treating these functionals as set-valued. The results hold in the case of totally or partially ordered explanatory variables. For total orders, we show that any solution resulting from the pool-adjacent-violators algorithm is optimal.
Single-index models are becoming increasingly popular in many scientific applications as they offer the advantages of flexibility in regression modeling as well as interpretable covariate effects. In the context of su...
详细信息
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the ...
详细信息
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.
The problem of constructing k-monotone regression is to find a vector z is an element of R-n with the lowest square error of approximation to a given vector y is an element of R-n (not necessary k-monotone) under cond...
详细信息
ISBN:
(纸本)9783319930312;9783319930305
The problem of constructing k-monotone regression is to find a vector z is an element of R-n with the lowest square error of approximation to a given vector y is an element of R-n (not necessary k-monotone) under condition of k-monotonicity of z. The problem can be rewritten in the form of a convex programming problem with linear constraints. The paper proposes two different approaches for finding a sparse k-monotone regression (Frank-Wolfe-type algorithm and k-monotone pooladjacentviolatorsalgorithm). A software package for this problem is developed and implemented in R. The proposed algorithms are compared using simulated data.
作者:
Yu, TaoLi, PengfeiQin, JingNatl Univ Singapore
Dept Stat & Appl Probabil Block S16Level 76 Sci Dr 2 Singapore 117546 Singapore Univ Waterloo
Dept Stat & Actuarial Sci 200 Univ Ave West Waterloo ON N2L 3G1 Canada NIAID
NIH 6700B Rockledge Dr Bethesda MD 20892 USA
In this paper, we propose a method for estimating the probability density functions in a two-sample problem where the ratio of the densities is monotone. This problem has been widely identified in the literature, but ...
详细信息
In this paper, we propose a method for estimating the probability density functions in a two-sample problem where the ratio of the densities is monotone. This problem has been widely identified in the literature, but effective solution methods, in which the estimates should be probability densities and the corresponding density ratio should inherit monotonicity, are unavailable. If these conditions are not satisfied, the applications of the resultant density estimates might be limited. We propose estimates for which the ratio inherits the monotonicity property, and we explore their theoretical properties. One implication is that the corresponding receiver operating characteristic curve estimate is concave. Through numerical studies, we observe that both the density estimates and the receiver operating characteristic curve estimate from our method outperform those resulting directly from kernel density estimates, particularly when the sample size is relatively small.
Recently, the methods used to estimate monotonic regression (MR) models have been substantially improved, and some algorithms can now produce high-accuracy monotonic fits to multivariate datasets containing over a mil...
详细信息
Recently, the methods used to estimate monotonic regression (MR) models have been substantially improved, and some algorithms can now produce high-accuracy monotonic fits to multivariate datasets containing over a million observations. Nevertheless, the computational burden can be prohibitively large for resampling techniques in which numerous datasets are processed independently of each other. Here, we present efficient algorithms for estimation of confidence limits in large-scale settings that take into account the similarity of the bootstrap or jackknifed datasets to which MR models are fitted. In addition, we introduce modifications that substantially improve the accuracy of MR solutions for binary response variables. The performance of our algorithms is illustrated using data on death in coronary heart disease for a large population. This example also illustrates that MR can be a valuable complement to logistic regression.
Group testing, introduced by Dorfman (1943), has been used to reduce costs when estimating the prevalence of a binary characteristic based on a screening test of k groups that include n independent individuals in tota...
详细信息
Group testing, introduced by Dorfman (1943), has been used to reduce costs when estimating the prevalence of a binary characteristic based on a screening test of k groups that include n independent individuals in total. If the unknown prevalence is low and the screening test suffers from misclassification, it is also possible to obtain more precise prevalence estimates than those obtained from testing all n samples separately (Tu et al., 1994). In some applications, the individual binary response corresponds to whether an underlying time-to-event variable T is less than an observed screening time C, a data structure known as current status data. Given sufficient variation in the observed C values, it is possible to estimate the distribution function F of T nonparametrically, at least at some points in its support, using the pool-adjacent-violators algorithm (Ayer et al., 1955). Here, we consider nonparametric estimation of F based on group-tested current status data for groups of size k where the group tests positive if and only if any individual's unobserved T is less than the corresponding observed C. We investigate the performance of the group-based estimator as compared to the individual test nonparametric maximum likelihood estimator, and show that the former can be more precise in the presence of misclassification for low values of F(t). Potential applications include testing for the presence of various diseases in pooled samples where interest focuses on the age-at-incidence distribution rather than overall prevalence. We apply this estimator to the age-at-incidence curve for hepatitis C infection in a sample of U.S. women who gave birth to a child in 2014, where group assignment is done at random and based on maternal age. We discuss connections to other work in the literature, as well as potential extensions.
We propose a new method for risk-analytic benchmark dose (BMD) estimation in a dose-response setting when the responses are measured on a continuous scale. For each dose level d, the observation X(d) is assumed to fol...
详细信息
We propose a new method for risk-analytic benchmark dose (BMD) estimation in a dose-response setting when the responses are measured on a continuous scale. For each dose level d, the observation X(d) is assumed to follow a normal distribution: N((d),sigma 2). No specific parametric form is imposed upon the mean (d), however. Instead, nonparametric maximum likelihood estimates of (d) and sigma are obtained under a monotonicity constraint on (d). For purposes of quantitative risk assessment, a hybrid' form of risk function is defined for any dose d as R(d) = P[X(d) < c], where c > 0 is a constant independent of d. The BMD is then determined by inverting the additional risk functionR(A)(d) = R(d) - R(0) at some specified value of benchmark response. Asymptotic theory for the point estimators is derived, and a finite-sample study is conducted, using both real and simulated data. When a large number of doses are available, we propose an adaptive grouping method for estimating the BMD, which is shown to have optimal mean integrated squared error under appropriate designs.
Classifier scores in many diagnostic devices, such as computer-aided diagnosis systems, are usually on an arbitrary scale, the meaning of which is unclear. Calibration of classifier scores to a meaningful scale such a...
详细信息
ISBN:
(纸本)9781628415063
Classifier scores in many diagnostic devices, such as computer-aided diagnosis systems, are usually on an arbitrary scale, the meaning of which is unclear. Calibration of classifier scores to a meaningful scale such as the probability of disease is potentially useful when such scores are used by a physician or another algorithm. In this work, we investigated the properties of two methods for calibrating classifier scores to probability of disease. The first is a semiparametric method in which the likelihood ratio for each score is estimated based on a semiparametric proper receiver operating characteristic model, and then an estimate of the probability of disease is obtained using the Bayes theorem assuming a known prevalence of disease. The second method is nonparametric in which isotonic regression via the pool-adjacent-violators algorithm is used. We employed the mean square error (MSE) and the Brier score to evaluate the two methods. We evaluate the methods under two paradigms: (a) the dataset used to construct the score-to-probability mapping function is used to calculate the performance metric (MSE or Brier score) (resubstitution);(b) an independent test dataset is used to calculate the performance metric (independent). Under our simulation conditions, the semiparametric method is found to be superior to the nonparametric method at small to medium sample sizes and the two methods appear to converge at large sample sizes. Our simulation results also indicate that the resubstitutionbias may depend on the performance metric and for the semiparametricmethod, the resubstitutionbias is small when a reasonable number of cases (>100 cases per class) are available.
The variance of the error term in ordinary regression models and linear smoothers is usually estimated by adjusting the average squared residual for the trace of the smoothing matrix (the degrees of freedom of the pre...
详细信息
The variance of the error term in ordinary regression models and linear smoothers is usually estimated by adjusting the average squared residual for the trace of the smoothing matrix (the degrees of freedom of the predicted response). However, other types of variance estimators are needed when using monotonic regression (MR) models, which are particularly suitable for estimating response functions with pronounced thresholds. Here, we propose a simple bootstrap estimator to compensate for the over-fitting that occurs when MR models are estimated from empirical data. Furthermore, we show that, in the case of one or two predictors, the performance of this estimator can be enhanced by introducing adjustment factors that take into account the slope of the response function and characteristics of the distribution of the explanatory variables. Extensive simulations show that our estimators perform satisfactorily for a great variety of monotonic functions and error distributions.
暂无评论