Monotonic regression (MR) is an efficient tool for estimating functions that are monotonic with respect to input variables. A fast and highly accurate approximate algorithm called the GPAV was recently developed for e...
详细信息
Monotonic regression (MR) is an efficient tool for estimating functions that are monotonic with respect to input variables. A fast and highly accurate approximate algorithm called the GPAV was recently developed for efficient solving large-scale multivariate MR problems. When such problems are too large, the GPAV becomes too demanding in terms of computational time and memory. An approach, that extends the application area of the GPAV to encompass much larger MR problems, is presented. It is based on segmentation of a large-scale MR problem into a set of moderate-scale MR problems, each solved by the GPAV. The major contribution is the development of a computationally efficient strategy that produces a monotonic response using the local solutions. A theoretically motivated trend-following technique is introduced to ensure higher accuracy of the solution. The presented results of extensive simulations on very large data sets demonstrate the high efficiency of the new algorithm. (C) 2011 Elsevier B.V. All rights reserved.
Recently, the methods used to estimate monotonic regression (MR) models have been substantially improved, and some algorithms can now produce high-accuracy monotonic fits to multivariate datasets containing over a mil...
详细信息
Recently, the methods used to estimate monotonic regression (MR) models have been substantially improved, and some algorithms can now produce high-accuracy monotonic fits to multivariate datasets containing over a million observations. Nevertheless, the computational burden can be prohibitively large for resampling techniques in which numerous datasets are processed independently of each other. Here, we present efficient algorithms for estimation of confidence limits in large-scale settings that take into account the similarity of the bootstrap or jackknifed datasets to which MR models are fitted. In addition, we introduce modifications that substantially improve the accuracy of MR solutions for binary response variables. The performance of our algorithms is illustrated using data on death in coronary heart disease for a large population. This example also illustrates that MR can be a valuable complement to logistic regression.
In general, the solution to a regression problem is the minimizer of a given loss criterion and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutio...
详细信息
In general, the solution to a regression problem is the minimizer of a given loss criterion and depends on the specified loss function. The nonparametric isotonic regression problem is special, in that optimal solutions can be found by solely specifying a functional. These solutions will then be minimizers under all loss functions simultaneously as long as the loss functions have the requested functional as the Bayes act. For the functional, the only requirement is that it can be defined via an identification function, with examples including the expectation, quantile, and expectile functionals. Generalizing classical results, we characterize the optimal solutions to the isotonic regression problem for identifiable functionals by rigorously treating these functionals as set-valued. The results hold in the case of totally or partially ordered explanatory variables. For total orders, we show that any solution resulting from the pool-adjacent-violators algorithm is optimal.
We propose a new method for risk-analytic benchmark dose (BMD) estimation in a dose-response setting when the responses are measured on a continuous scale. For each dose level d, the observation X(d) is assumed to fol...
详细信息
We propose a new method for risk-analytic benchmark dose (BMD) estimation in a dose-response setting when the responses are measured on a continuous scale. For each dose level d, the observation X(d) is assumed to follow a normal distribution: N((d),sigma 2). No specific parametric form is imposed upon the mean (d), however. Instead, nonparametric maximum likelihood estimates of (d) and sigma are obtained under a monotonicity constraint on (d). For purposes of quantitative risk assessment, a hybrid' form of risk function is defined for any dose d as R(d) = P[X(d) < c], where c > 0 is a constant independent of d. The BMD is then determined by inverting the additional risk functionR(A)(d) = R(d) - R(0) at some specified value of benchmark response. Asymptotic theory for the point estimators is derived, and a finite-sample study is conducted, using both real and simulated data. When a large number of doses are available, we propose an adaptive grouping method for estimating the BMD, which is shown to have optimal mean integrated squared error under appropriate designs.
In the present paper we consider the isotonic regression problem with an arbitrary convex distance function d(.), and the main purpose being to present an algorithm for obtaining all isotonic regressions under this re...
详细信息
In the present paper we consider the isotonic regression problem with an arbitrary convex distance function d(.), and the main purpose being to present an algorithm for obtaining all isotonic regressions under this reasonable assumption on d(.). Further, we consider a piece-wise linear distance function d(.) of the type d(t) = C-\t\ for t < 0 and d(t) = C+ \t\ for t greater-than-or-equal-to 0 and get an isotonic pth frctile regression by choosing p = C+ /(C- + C+).
Discrimination of time series is an important practical problem with applications in various scientific fields. We propose and study a novel approach to this problem. Our approach is applicable to cases where time ser...
详细信息
Discrimination of time series is an important practical problem with applications in various scientific fields. We propose and study a novel approach to this problem. Our approach is applicable to cases where time series in different categories have a different "shape." Although based on the idea of feature extraction, our method is not distance-based, and as such does not require aligning the time series. Instead, features are measured for each time series, and discrimination is based on these individual measures. An AR process with a time-varying variance is used as an underlying model. Our method then uses shape measures or, better, measures of concentration of the variance function, as a criterion for discrimination. It is this concentration aspect or shape aspect that makes the approach intuitively appealing. We provide some mathematical justification for our proposed methodology, as well as a simulation study and an application to the problem of discriminating earthquakes and explosions.
In this paper we describe active set type algorithms for minimization of a smooth function under general order constraints, an important case being functions on the set of bimonotone rxs matrices. These algorithms can...
详细信息
In this paper we describe active set type algorithms for minimization of a smooth function under general order constraints, an important case being functions on the set of bimonotone rxs matrices. These algorithms can be used, for instance, to estimate a bimonotone regression function via least squares or (a smooth approximation of) least absolute deviations. Another application is shrinkage estimation in image denoising or, more generally, regression problems with two ordinal factors after representing the data in a suitable basis which is indexed by pairs (i,j)a{1,aEuro broken vertical bar,r}x{1,aEuro broken vertical bar,s}. Various numerical examples illustrate our methods.
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the ...
详细信息
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.
We consider the problem of optimally quantifying the categories of an ordered response variable under a linear model. The mathematical formulation leads to the maximization of a ratio of quadratic forms subject to lin...
详细信息
We consider the problem of optimally quantifying the categories of an ordered response variable under a linear model. The mathematical formulation leads to the maximization of a ratio of quadratic forms subject to linear inequality constraints. The solution is given by a hierarchical active constraints search algorithm. We prove that the algorithm converges to the global optimum.
The variance of the error term in ordinary regression models and linear smoothers is usually estimated by adjusting the average squared residual for the trace of the smoothing matrix (the degrees of freedom of the pre...
详细信息
The variance of the error term in ordinary regression models and linear smoothers is usually estimated by adjusting the average squared residual for the trace of the smoothing matrix (the degrees of freedom of the predicted response). However, other types of variance estimators are needed when using monotonic regression (MR) models, which are particularly suitable for estimating response functions with pronounced thresholds. Here, we propose a simple bootstrap estimator to compensate for the over-fitting that occurs when MR models are estimated from empirical data. Furthermore, we show that, in the case of one or two predictors, the performance of this estimator can be enhanced by introducing adjustment factors that take into account the slope of the response function and characteristics of the distribution of the explanatory variables. Extensive simulations show that our estimators perform satisfactorily for a great variety of monotonic functions and error distributions.
暂无评论