Imputation of Missing Data Using the Combination of Singular Spectrum Analysis Method and Kalman Filter Algorithm and Comparison with Univariate Imputation Methods

Document Type : Original Paper

Authors

Department of Statistics, Payame Noor University, Tehran 19395-4697, Iran.

Abstract

Missing values in time series data are one of the problems that sometimes arise in time series analysis. The more accurate imputation of these values, the better understanding of the structure of the time series will be obtained, and as a result, the recognition of its pattern and the prediction of future values will be more accurate. Therefore, choosing an appropriate method of imputation is an important part of a time series analysis. In this paper we introduce the new missing data imputation method from the singular spectrum analysis procedure, using the Kalman filter algorithm. Then other methods of imputation of missing values in univariate time series are introduced, and will be compared the mentioned methods using simulated data in structural models and real data. The results of the comparison based on the criteria of root mean square error and mean absolute deviations show that the method of imputation of missing values based on singular spectrum analysis approach using the Kalman filter algorithm has a better performance than the other imputation methods and the mode method is the worst.‎‎

Keywords

Main Subjects


[1] Abraham, B., 1981. Missing observations in time series, Communications in Statistics­Theory and Methods, 10 (16), 1643­1653. doi: 10.1080/03610928108828138
[2] Broomhead,D.and King,G.,1986b.On the qualitative analysis of experimental dynamical systems,Nonlinear Phenomena and Chaos.
[3] Chatfield, C., 2000. Time­Series Forecasting, Chapman & Hall/CRC.
[4] Commandeur, J. F. and Koopman, S. J. (2007) An Introduction to State Space Time Series Analysis,Oxford University Press Inc, New York.
[5] Golyandina, N. and Zhigljavsky, A., 2013. Singular Spectrum Analysis for Time Series, Springer.
[6] Gomez, V. and Maravall, A., 1994. Estimation, Prediction, and Interpolation for Nonstationary Series with the Kalman Filter, Journal of the American Statistical Association, 89 (426), 611­624. doi:
10.1080/01621459.1994.10476786
[7] Harvey, A. C. and Pierse, R. G., 1984. Estimating Missing Observations in Economic Time Series, Journal of the American Statistical Association, 79 (385), 125­131. doi:
10.1080/01621459.1984.10477074
[8] Hui­zan, W., Rein, Z., Wei, L., Gui­hua, W. and Bao­gang, J., 2008. Improved interpolation method
based on singular spectrum analysis iteration and its application to missing data recovery,Applied Mathematics and Mechanics (English Edition), 29, 1351–1361. doi: 10.1007/s10483­008­1010­x
[9] Junger, W. L., De Leon, A. P. and Santos, N., 2003. Missing Data Imputation in Multivariate Time
Series via EM Algorithm, Cadernos do IME, 15, 8­21.
[10] Junger, W. L., De Leon, A. P., 2012. mtsdi: Multivariate Time Series Data Imputation. http://CRAN.R­project.org/package=mtsdi.
[11] Kalman, R. E., 1960. A new approach to linear filtering and prediction problems, J. of Basic Engineering, 83, 35­45. doi: 10.1115/1.3662552
[12] Kalman, R. E. and Bucy, R. S., 1961. New results in linear filtering and prediction theory, J. of Basic Engineering, 83, 95­108. doi: 10.1115/1.3658902
[13] Kondrashov, D. and Ghil, M., 2006. Spatio­temporal filling of missing points in geophysical data sets, Nonlinear Processes in Geophysics, 13, 151–159. doi: 10.5194/npg­13­151­2006
[14] Ljung, G. M., 1989. A Note on the Estimation of Missing Values in Time Series, Communications in Statistics­Simulation and Computation, 18 (2), 459­465. doi: 10.1080/03610918908812770
[15] Mahmoudvand, R. and Rodrigues, P. C., 2016. Missing value imputation in time series using
singular spectrum analysis, International Journal of Energy and Statistics, 4(1), 1650005. doi:
10.1142/S2335680416500058
[16] Moritz, S., 2016. imputeTS: Time Series Missing Value Imputation. https://CRAN.Rproject. org/package=imputeTS. R package version 1.8. doi: 10.32614/rj­2017­009
[17] Pena, D., Tiao, G. C. and Tsay, R. S., 2011. A Course in Time Series Analysis, chap. Outliers, Influential Observations and Missing Data, John Wiley & Sons, 136–170. doi: 10.1002/9781118032978.ch6
[18] Pourahmadi, M., 1989. Estimation and Interpolation of missing values of a stationary time series, Journal of Time Series Analysis, 10 (2), 149­169. doi: 10.1111/j.1467­9892.1989.tb00021.x
[19] Rodrigues, P. C. and Carvalho, M. D., 2013. Spectral modeling of time series with missing data,
Applied Mathematical Modelling, 37, 4676–4684. doi: 10.1016/j.apm.2012.09.040
[20] Sanei, S. and Hassani, H., 2016. Singular Spectrum Analysis of Biomedical Signals. Taylor & Francis/CRC.
[21] Shumway, R. H. and Stoffer, D. S., 2011. Time Series Analysis and Application, 3rd ed, Springer, New York.
[22] The Google Flu and Dengue Trends Team, 2015. ‘Google Flu Trends’. URL: http://www.google.org/flutrends
[23] Wu, S. F., Chang, C. Y. and Lee, S. J., 2015. Time Series Forecasting with Missing Values. 1st
International Conference on Industrial Networks and Intelligent Systems (INISCom), 151­156. doi:
10.4108/icst.iniscom.2015.258269