Regression Modeling Via T-Lasso Bayesian Method

Document Type : Original Paper

Authors

Department of Statistics, Payame Noor University, P.O.Box 19395-4697, Tehran, Iran

Abstract

Choosing the optimal model is one of the important issues in regression models. The purpose of optimal model selection methods in regression models is to determine important explanatory variables and negligible variables and to express the relationship between response variable and explanatory variables more simply. Due to the limitations of classical variable selection processes such as stepwise selection, penalized regression methods can be used. One of the penalized regression models is Lasso regression in which the errors are assumed to follow a normal distribution. For statistical analysis of the data set in the presence of outlier observations, the student’s t distribution for error can be used and robust estimators can be provided. In this article, a variable selection method called Bayesian T-Lasso regression model is proposed based on Lasso Bayesian regression model in the presence of outlier observations in the data. The Bayesian T-Lasso regression model is presented with two different representations of the Laplace density function for the regression model coefficients, At the first the Laplace density function is represented as a scale mixture of normal distribution and then a scale mixture of uniform distribution. We demonstrate the utility of our Bayesian T-Lasso regression using simulation methods and real data analysis.

Keywords

Main Subjects


[1] نوروزی راد، م. و آرشی، م. ( ۱۳۹۶ ). مطالعه رفتار حدی برآوردگرهای انقباضی در مدل رگرسیون تاوانیده با نرم مستطیلی. علوم آماری،
.۱۴۹-۱۷۴ ،١١(١)
[2]  کاظمی، م. شاهسونی، د. و آرشی، م. ( ۱۳۹۷ ). انتخاب متغیر و تشخیص ساختار در بعد بالا برای مدل های جمعی خطی جزئی. علوم  آماری 
.۴۸۵-۵۱۲ ،١٢(٢)
[3]  آرست، م.، آرشی، م. و ربیعی، م.ر. ( ۱۳۹۸ ). مطالعه رفتار برآوردگر انقباضی تحت یک قید خطی در مدل رگرسیون تاوانیده. علوم آماری، 
.۱-۱۴ ،١٣(١)
[4] تعاونی، م. و آرشی، م. ( ۱۳۹۹ ). انتخاب متغیر در مدل های خطی جزئی با اثرات آمیخته برای داده های طولی. علوم آماری،  ١۴(٢)  .۳۶۷-۳۸۸
 
[5] Lange, K.L., etal. (1989). Robust statistical modeling using the t distribution. Journal of the American statistical association, 84, 881-896.
[6] Liu, C. and Rubin, D. (1995). ML estimation of the t distribution using EM and its extensions, ECM and ECME. StatisticaSinica, 5, 19-39.
[7] Lin, J.G, etal. (2009). Heteroscedasticity diagnostics for t linear regression models. Metrika, 70, 59-77.
[8] Neter, J., Kutner, H., Wasserman W. and Nachtcheim. (1996). Applied Linear Regression Models. McGraw-Hill College.
[9] Hoerl, A.E., and Kennard, R.W. (1970). Ridge Regression: Biased Estimation for Non orthogonal Problems. Technometrics,12, 55-67.
[10] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society, 58, 267-288.
[11] Andrews, D.F., and Mallows, C.L. (1974). Scale Mixtures of Normal Distributions. Journal of the Royal statistical series, 36, 99-102.
[12] Park, T. and Casella, G. (2008). The Bayesian Lasso. Journal of the American Statistical Association, 103, 681-687.
[13] Wang, J.J.J., etal. (2011). Stochastic volatility models with leverage and heavy-tailed distributions, A Bayesian approach using scale mixtures. Computational Statistics and Data Analysis, 55, 852-862.
[14] Spiegelhalter, D.J., etal. (2002). Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society, 64, 583-639.
[15] Harrison, D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81-102.
[16] Shapiro, S. S. and Wilk, M. B. (1965). An Analysis of Variance Test. Biometrika, 52, 591-611.
[17] Gelman, A., etal. (2003). Bayesian data analysis. Chapman Hall, London.
[18] Heidelbeger P. and Welch PD. (1981). A spectral method for confidence interval generation and run length control in simulations. Comm.ACM, 24, 233-245
[19] Ding, P. and blitzstein J.K.(2018). On the Gaussian Mixture Representation of the Laplace Distribution. The American Statistician, 72, 172-174.
[20] Mallick, H. and Yi, N. (2014). A New Bayesian Lasso. Statistics and Its Interface, 7, 571-582.
[21] Saleh, A.K. Md. Ehsanes, Arashi, M., and tabatabaey, S.M.M. (n.d.).(2014). Statistical Infrence for Models with Multivariate t-Distributed Errors. In Statistical Infrence for Models with Multivariate t-Distributed Errors. John Wiley, New Jersy.
[22] Hlavackova-Schindler, K. (2016). Prediction Cosistency of Lasso Regression Does Not Need Normal
Errors. British Journal of Mathematics Computer Science, 19(4), 2231-0851.
[23] Mallick, H. and Nengjun, Y. (2013). Bayesian Methods for High Dimensional Linear Models. Journal of Biometrics Biostatistics, 1, 1-13.
[24] Knight, K. and Fu, W. (2000). asymptotics for LASSO-type Estimators. Annals of Statistics, 28, 1356-1378.
[25] Belloni, A. and Chernozhukov, V. (2013). Least Squares after Model Selection in High-Dimensional Sparse Models. Bernoulli, 19, 521-547.
[26] Fan, J. and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 96, 1348-1360.
[27] Tikhonov, A.N. (1963). solution of Incorrectly Formulated Problems and the Regularization Mehod. Doklady Akademii Nauk SSSR, 151, 501-504.
[28] Zou, H. and Hastie, T. (2005). Regularization and variable Selection via the Elastic Net. Journal of the Royal Statistical Society, Series B, 67, 301-320.
[29] Zou, H. (2006). The Adaptive Lasso and its Oracle Properties. Journal of the American Statistical Association, 101, 1418-1429.
[30] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R.(2004). Least Angle Regression. Annals of Statistics, 32, 407-499.
[31] Mallick, H. and Nengjun, Y.(2013). Bayesian Methods for High Dimensional Linear Models. Journal of Biometrics Biostatistics, 1, 1-13.