This paper proposes an automatic and simple approach to design a neo-fuzzy neuron for identification purposes. The proposed approach uses the backfitting algorithm to learn multiple univariate additive models, where e...
详细信息
This paper proposes an automatic and simple approach to design a neo-fuzzy neuron for identification purposes. The proposed approach uses the backfitting algorithm to learn multiple univariate additive models, where each additive model is a zero-order T-S fuzzy system which is a function of one input variable, and there is one additive model for each input variable. The multiple zero-order T-S fuzzy models constitute a neo-fuzzy neuron. The structure of the model used in this paper allows to have results with good interpretability and accuracy. To validate and demonstrate the performance and effectiveness of the proposed approach, it is applied on 10 benchmark data sets and compared with the extreme learning machine (ELM), support vector regression (SVR) algorithms, and two algorithms for design neo-fuzzy neuron systems, an adaptive learning algorithm for a neo-fuzzy neuron systems (ALNFN), and a fuzzy Kolmogorov's network (FKN). A statistical paired t test analysis is also presented to compare the proposed approach with ELM, SVR, ALNFN, and FKN with the aim to see whether the results of the proposed approach are statistically different from ELM, SVR, ALNFN, and FKN. The results indicate that the proposed approach outperforms ELM and FKN in all data sets and outperforms SVR and ALNFN in almost all data sets that they were statistically different in almost all data sets and that in most data sets the number of fuzzy rules selected by cross-validation was small obtaining a model with a small complexity and good interpretability capability.
This paper develops a model for multi-task machine learning that incorporates per-task parametric and nonparametric effects in an additive way. This allows a practitioner the flexibility of modeling the tasks in a cus...
详细信息
This paper develops a model for multi-task machine learning that incorporates per-task parametric and nonparametric effects in an additive way. This allows a practitioner the flexibility of modeling the tasks in a customized manner, increasing model performance compared to other modern multi-task methods, while maintaining a high degree of model explainability. We also introduce novel methods for task diagnostics, which are based on the statistical influence of tasks on the model's performance, and propose testing methods and remedial measures for outlier tasks. Additive multi-task learning model with task diagnostics is examined on a well-known real-world multi-task benchmark dataset and shows a significant performance improvement over other modern multi-task methods.
This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-ord...
详细信息
This paper proposes a polynomial structure identification (PSI) method for variable selection and model structure identification of additive models with longitudinal data. First, the backfitting algorithm and zero-order local polynomial smoothing method are used to select important variables in the additive model, and the importance of variables is determined through the inverse of the bandwidth parameter in the nonparametric partial kernel function. Second, the backfitting algorithm and Q-order local polynomial smoothing method are utilized to identify the specific structure of each selected predictor. To incorporate correlations within longitudinal data, a two-stage estimation method is proposed for estimating the regression parameters of the identified important variables: (i) Parameter estimators of the important variables are firstly obtained under an independence working model assumption;(ii) Generalized estimating equations with a working correlation matrix based on B-splines are constructed to obtain the final estimators of the parameters, which improve the efficiency of parameter estimation. Finally, simulation studies are carried out to evaluate the performance of the proposed method, followed by the presentation of two real-world examples for illustration.
Changes in levels of multiple time series based on their common components helps characterize the shared behavior of the data generating process with known events causing perturbations in their movements. However, cha...
详细信息
Changes in levels of multiple time series based on their common components helps characterize the shared behavior of the data generating process with known events causing perturbations in their movements. However, changes in variance of the error structure leads to misspecification of models that complicates statistical inference. Volatility structures may incorporate variance into the time series models, but these can easily lead to overparameterization in multiple time series. A model with inherent changes in variance structure is used to develop a bootstrap-based test for presence of changes in variance of multiple time series. Simulation studies shows that the test is correctly-sized and powerful compared to CUSUM-based test in a wide range of scenarios. The test is advantageous specially in the presence of strong autocorrelations and even in unbalanced data. Using global stock prices with the onset of lockdowns against the COVID19 pandemic as the stimulus of the changepoint, the test was able to detect significant changes in error variance before and during the pandemic. While CUSUM-based test also recognized the significant changes in multiple time series, linear trend was misconstrued as evidence of the presence of change in variance in stock prices.
Semiparametric models hold promise to address many challenges to statistical inference that arise from real-world applications, but their novelty and theoretical complexity create challenges for estimation. Taking adv...
详细信息
Semiparametric models hold promise to address many challenges to statistical inference that arise from real-world applications, but their novelty and theoretical complexity create challenges for estimation. Taking advantage of the broad applicability of semiparametric models, we propose some novel and improved methods to estimate the regression coefficients of generalized partially linear models (GPLM). This model extends the generalized linear model by adding a nonparametric component. Like in parametric models, variable selection is important in the GPLM to single out the inactive covariates for the response. Instead of deleting inactive covariates, our approach uses them as auxiliary information in the estimation procedure. We then define two models, one that includes all the covariates and another that includes the active covariates only. We then combine these two model estimators optimally to form the pretest and shrinkage estimators. Asymptotic properties are studied to derive the asymptotic biases and risks of the proposed estimators. We show that if the shrinkage dimension exceeds two, the asymptotic risks of the shrinkage estimators are strictly less than those of the full model estimators. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed estimation methods. We then apply our proposed methods to two real data sets. Our simulation and real data results show that the proposed estimators perform with higher accuracy and lower variability in the estimation of regression parameters for GPLM compared with competing estimation methods.
Our practical motivation is the analysis of potential correlations between spectral noise current and threshold voltage from common on-wafer MOSFETs. The usual strategy leads to the use of standard techniques based on...
详细信息
Our practical motivation is the analysis of potential correlations between spectral noise current and threshold voltage from common on-wafer MOSFETs. The usual strategy leads to the use of standard techniques based on Normal linear regression easily accessible in all statistical software (both free or commercial). However, these statistical methods are not appropriate because the assumptions they lie on are not met. More sophisticated methods are required. A new strategy based on the most novel nonparametric techniques which are data-driven and thus free from questionable parametric assumptions is proposed. A backfitting algorithm accounting for random effects and nonparametric regression is designed and implemented. The nature of the correlation between threshold voltage and noise is examined by conducting a statistical test, which is based on a novel technique that summarizes in a color map all the relevant information of the data. The way the results are presented in the plot makes it easy for a non-expert in data analysis to understand what is underlying. The good performance of the method is proven through simulations and it is applied to a data case in a field where these modern statistical techniques are novel and result very efficient.
In this work, we propose a new model called generalized symmetrical partial linear model, based on the theory of generalized linear models and symmetrical distributions. In our model the response variable follows a sy...
详细信息
In this work, we propose a new model called generalized symmetrical partial linear model, based on the theory of generalized linear models and symmetrical distributions. In our model the response variable follows a symmetrical distribution such a normal, Student-t, power exponential, among others. Following the context of generalized linear models we consider replacing the traditional linear predictors by the more general predictors in whose case one covariate is related with the response variable in a non-parametric fashion, that we do not specified the parametric function. As an example, we could imagine a regression model in which the intercept term is believed to vary in time or geographical location. The backfitting algorithm is used for estimating the parameters of the proposed model. We perform a simulation study for assessing the behavior of the penalized maximum likelihood estimators. We use the quantile residuals for checking the assumption of the model. Finally, we analyzed real data set related with pH rivers in Ireland.
In multivariate nonparametric regression, the additive models are very useful when a suitable parametric model is difficult to find. The backfitting algorithm is a powerful tool to estimate the additive components. Ho...
详细信息
In multivariate nonparametric regression, the additive models are very useful when a suitable parametric model is difficult to find. The backfitting algorithm is a powerful tool to estimate the additive components. However, due to complexity of the estimators, the asymptotic p value of the associated test is difficult to calculate without a Monte Carlo simulation. Moreover, the conventional tests assume that the predictor variables are strictly continuous. In this paper, a new test is introduced for the additive components with discrete or categorical predictors, where the model may contain continuous covariates. This method is also applied to the semiparametric regression to test the goodness of fit of the model. These tests are asymptotically optimal in terms of the rate of convergence, as they can detect a specific class of contiguous alternatives at a rate of n-1/2. An extensive simulation study and a real data example are presented to support the theoretical results.
We incorporate a random clustering effect into the nonparametric version of Cox Proportional Hazards model to characterize clustered survival data. The simulation studies provide evidence that clustered survival data ...
详细信息
We incorporate a random clustering effect into the nonparametric version of Cox Proportional Hazards model to characterize clustered survival data. The simulation studies provide evidence that clustered survival data can be better characterized through a nonparametric model. Predictive accuracy of the nonparametric model is affected by number of clusters and distribution of the random component accounting for clustering effect. As the functional form of the covariate departs from linearity, the nonparametric model is becoming more advantageous over the parametric counterpart. Finally, nonparametric is better than parametric model when data are highly heterogenous and/or there is misspecification error.
A nonparametric test for the presence of clustering in survival data is proposed. Assuming a model that incorporates the clustering effect into the Cox Proportional Hazards model, simulation studies indicate that the ...
详细信息
A nonparametric test for the presence of clustering in survival data is proposed. Assuming a model that incorporates the clustering effect into the Cox Proportional Hazards model, simulation studies indicate that the procedure is correctly sized and powerful in a reasonably wide range of scenarios. The test for the presence of clustering over time is also robust to model misspecification. With large number of clusters, the test is powerful even if the data is highly heterogeneous.
暂无评论