In testing the factorial invariance of a measure across groups, the groups are often of different sizes. Large imbalances in group size might affect the results of factorial invariance studies and lead to incorrect co...
详细信息
In testing the factorial invariance of a measure across groups, the groups are often of different sizes. Large imbalances in group size might affect the results of factorial invariance studies and lead to incorrect conclusions of invariance because the fit function in multiple-group factor analysis includes a weighting by group sample size. The implication is that violations of invariance might not be detected if the sample sizes of the 2 groups are severely unbalanced. In this study, we examined the effects of group size differences on results of factorial invariance tests, proposed a subsampling method to address unbalanced sample size issue in factorial invariance studies, and evaluated the proposed approach in various simulation conditions. Our findings confirm that violations of invariance might be masked in the case of severely unbalanced group size conditions and support the use of the proposed subsampling method to obtain accurate results for invariance studies.
This paper proposes the construction of a confidence set for the date of a structural change at the end of a sample in a linear regression model. While the break fraction, that is, the ratio of the number of observati...
详细信息
This paper proposes the construction of a confidence set for the date of a structural change at the end of a sample in a linear regression model. While the break fraction, that is, the ratio of the number of observations before the break to the sample size, is typically assumed to take a value in the (0, 1) open interval, we consider the case where a permissible break date is included in a fixed number of observations at the end of the sample;thus the break fraction approaches 1 as the sample size goes to infinity. We propose inverting the test for the break date to construct a confidence set while obtaining the critical values by using the subsampling method. By using Monte Carlo simulations, we show that the confidence set proposed in this paper can control the coverage rate in finite samples well while the average length of the confidence set is comparable to existing methods based on asymptotic theory with a fixed break fraction in the (0, 1) interval.
This article contains the data on farmers' determinants of binary choices for manure use (i.e., manure is used or unused) and fertiliser use (i.e., fertiliser is used or unused) at their fields in semi-arid northe...
详细信息
This article contains the data on farmers' determinants of binary choices for manure use (i.e., manure is used or unused) and fertiliser use (i.e., fertiliser is used or unused) at their fields in semi-arid northern Ethiopian Rift Valley. The data includes (i) a schematic diagram that represents local farmers' distinctions of the crop field types in terms of the distance from their houses and soil fertility and (ii) a table that describes a representative farmer's crop sequences and soil fertilisation methods in two consecutive years. Details about the literature review of the previous case studies on farmers' determinants of manure application technique adoption conducted in some parts of sub-Saharan Africa where cattle dung is used for manure are also summarized in a table. A table shows descriptive statistics of the independent variables used in the empirical analyses. Summary statistics of 4 binomial logit models and 4 multinomial logit models are indicated in a table, which represent model fit. Last two tables exhibited in this article show the logit analyses. (C) 2017 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
Background: Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expre...
详细信息
Background: Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. Results: When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. Conclusion: We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.
This paper derives a robust Kalman smoother estimate for the errors-in-variables state space model that is less sensitive to outliers in the sense of the multivariate least trimmed squares (MLTS) method. Since the MLT...
详细信息
This paper derives a robust Kalman smoother estimate for the errors-in-variables state space model that is less sensitive to outliers in the sense of the multivariate least trimmed squares (MLTS) method. Since the MLTS estimate is a combinatorial optimization problem, the randomized algorithm has been proposed. However, the uniform sampling method has a high computational cost and may lead to a biased estimate. Therefore, we apply the subsampling method. The algorithm presented here is both efficient and easy to implement. A Monte Carlo simulation result shows the efficiency of the proposed algorithm.
We consider the problem of determining the optimal block (or subsample) size for a spatial subsampling method for spatial processes observed on regular grids. We derive expansions for the mean square error of the subs...
详细信息
We consider the problem of determining the optimal block (or subsample) size for a spatial subsampling method for spatial processes observed on regular grids. We derive expansions for the mean square error of the subsampling variance estimator, which yields an expression for the theoretically optimal block size. The optimal block size is shown to depend in an intricate way on the geometry of the spatial sampling region as well as characteristics of the underlying random field. Final expressions for the optimal block size make use of some nontrivial estimates of lattice point counts in shifts of convex sets. Optimal block sizes are computed for sampling regions of a number of commonly encountered shapes. Numerical studies are performed to compare subsampling methods as well as procedures for estimating the theoretically best block size.
暂无评论