Often researchers make use of Principal Component analysis and Partial Least Squares Regression, an unsupervised and a supervised method, respectively, to extract the chemical information in the shape of one or more l...
详细信息
Often researchers make use of Principal Component analysis and Partial Least Squares Regression, an unsupervised and a supervised method, respectively, to extract the chemical information in the shape of one or more latent variables. However, when the research question is qualitative and requires a figure of merit, these two models will primarily focus on the quantitative and continuous information present in the data. In these cases, a valid approach may be to dichotomize the data and to analyze the resulting non-linear data via Non-Linear Principal Component analysis. However, the results of the method are not always easy to interpret due to the possible multidimensionality of the solution. Here we introduce an alternative framework, composed of Rasch modeling and Generalized Linear Mixed Effect Models, to extract information from multivariate binary chemical data with an underlying design of experiment. The model obtained by this framework provides information in a unidimensional representation that can be easily translated into one-dimensional action and control. Furthermore, we show that, through Generalized Linear Mixed Effect Models, it is possible to extend the Rasch model to its multilevel form, which enables the consideration of each random factor possibly present.
The venerable method of maximum likelihood has found numerous recent applications innonparametricestimation of regression and shape constrained densities. For mixture models the nonparametric maximum likelihood estima...
详细信息
The venerable method of maximum likelihood has found numerous recent applications innonparametricestimation of regression and shape constrained densities. For mixture models the nonparametric maximum likelihood estimator (NPMLE) of Kiefer and Wolfowitz plays a central role in recent developments of empirical Bayes methods. The NPMLE has also been proposed by Cosslett as an estimation method for single index linear models for binary response with random coefficients. However, computational difficulties have hindered its application. Combining recent developments in computational geometry and convex optimization, we develop a new approach to computation for such models that dramatically increases their computational tractability. Consistency of the method is established for an expanded profile likelihood formulation. The methods are evaluated in simulation experiments, compared to the deconvolution methods of Gautier and Kitamura and illustrated in an application to modal choice for journey-to-work data in the Washington DC *** this article are available online.
binary random variables are regarded as random vectors in a binary-field (modulo-2) linear vector space. A characteristic function is defined and related results derived using this formulation. Minimax estimation of p...
详细信息
binary random variables are regarded as random vectors in a binary-field (modulo-2) linear vector space. A characteristic function is defined and related results derived using this formulation. Minimax estimation of probability distributions using an entropy criterion is investigated, which leads to an A-distribution and bilinear discriminant functions. Nonparametric classification approaches using Hamming distances and their asymptotic properties are discussed. Experimental results are presented.
We present an efficient method of calculating exact confidence intervals for the hypergeometric parameter representing the number of "successes," or "special items," in the population. The method i...
详细信息
We present an efficient method of calculating exact confidence intervals for the hypergeometric parameter representing the number of "successes," or "special items," in the population. The method inverts minimum-width acceptance intervals after shifting them to make their endpoints nondecreasing while preserving their level. The resulting set of confidence intervals achieves minimum possible average size, and even in comparison with confidence sets not required to be intervals it attains the minimum possible cardinality most of the time, and always within 1. The method compares favorably with existing methods not only in the size of the intervals but also in the time required to compute them. The available R package hyperMCI implements the proposed method.
A common statistical problem encountered in biomedical research is to test the hypothesis that the parameters of k binomial populations are all equal. An exact test of significance of this hypothesis is possible in pr...
详细信息
A common statistical problem encountered in biomedical research is to test the hypothesis that the parameters of k binomial populations are all equal. An exact test of significance of this hypothesis is possible in principle, the appropriate null distribution being a normalized product of k binomial coefficients. However, the problem of computing the tail area of this distribution can be formidable since it requires the enumeration of all sets of k binomial coefficients whose product is less than a given constant. Existing algorithms, all of which rely on explicit enumeration to generate feasible binomial coefficients
Univariate isotonic regression (IR) has been used for nonparametric estimation in dose-response and dose-finding studies. One undesirable property of IR is the prevalence of piecewise-constant stretches in its estimat...
详细信息
Univariate isotonic regression (IR) has been used for nonparametric estimation in dose-response and dose-finding studies. One undesirable property of IR is the prevalence of piecewise-constant stretches in its estimates, whereas the dose-response function is usually assumed to be strictly increasing. We propose a simple modification to IR, called centered isotonic regression (CIR). CIR's estimates are strictly increasing in the interior of the dose range. In the absence of monotonicity violations, CIR and IR both return the original observations. Numerical examination indicates that for sample sizes typical of dose-response studies and with realistic dose-response curves, CIR provides a substantial reduction in estimation error compared with IR when monotonicity violations occur. We also develop analytical interval estimates for IR and CIR, with good coverage behavior. An R package implements these point and interval estimates. Supplementary materials for this article are available online.
Despite the availability of data leak detection and prevention tools, there are currently a growing number of confidential data leaks through the fault of insiders. One of the possible data leak channels is encrypted ...
详细信息
Despite the availability of data leak detection and prevention tools, there are currently a growing number of confidential data leaks through the fault of insiders. One of the possible data leak channels is encrypted or compressed data transfer, because the existing data leak detection tools use content dataanalysis methods. This article presents an algorithm of detecting encrypted and compressed data that is based on the statistical model of pseudo-random sequences and allows detecting encrypted and compressed data to an accuracy of 0.99.
Link prediction in networks is typically accomplished by estimating or ranking the probabilities of edges for all pairs of nodes. In practice, especially for social networks, the data are often collected by egocentric...
详细信息
Link prediction in networks is typically accomplished by estimating or ranking the probabilities of edges for all pairs of nodes. In practice, especially for social networks, the data are often collected by egocentric sampling, which means selecting a subset of nodes and recording all of their edges. This sampling mechanism requires different prediction tools than the typical assumption of links missing at random. We propose a new computationally efficient link prediction algorithm for egocentrically sampled networks, estimating the underlying probability matrix by estimating its row space. We empirically evaluate the method on several synthetic and real-world networks and show that it provides accurate predictions for network links. including the code for experiments are available online.
This paper presents a new stochastic multidimensional scaling vector threshold model designed to analyze “pick any/n” choice data (e.g., consumers rendering buy/no buy decisions concerning a number of actual product...
详细信息
This paper presents a new stochastic multidimensional scaling vector threshold model designed to analyze “pick any/n” choice data (e.g., consumers rendering buy/no buy decisions concerning a number of actual products). A maximum likelihood procedure is formulated to estimate a joint space of both individuals (represented as vectors) and stimuli (represented as points). The relevant psychometric literature concerning the spatial treatment of such binary choice data is reviewed. The nonlinear probit type model is described, as well as the conjugate gradient procedure used to estimate parameters. Results of Monte Carlo analyses investigating the performance of this methodology with synthetic choice data sets are presented. An application concerning consumer choices for eleven competitive brands of soft drinks is discussed. Finally, directions for future research are presented in terms of further applications and generalizing the model to accommodate three-way choice data.
The purpose of this article is to provide researchers, editors, and readers with a set of guidelines for what to expect in an article using logistic regression techniques. Tables, figures, and charts that should be in...
详细信息
The purpose of this article is to provide researchers, editors, and readers with a set of guidelines for what to expect in an article using logistic regression techniques. Tables, figures, and charts that should be included to comprehensively assess the results and assumptions to be verified are discussed. This article demonstrates the preferred pattern for the application of logistic methods with an illustration of logistic regression applied to a data set in testing a research hypothesis. Recommendations are also offered for appropriate reporting formats of logistic regression results and the minimum observation-to-predictor ratio. The authors evaluated the use and interpretation of logistic regression presented in 8 articles published in The Journal of Educational Research between 1990 and 2000. They found that all 8 studies met or exceeded recommended criteria.
暂无评论