Abstract
A wide variety of application domains have to deal with incomplete data sets. In particular, data from sensors networks are often incomplete due to factors like partial system failures or bad conditions of measurements. With such incomplete massive spatio-temporal data sets, it becomes practically hard to manipulate data and to extract knowledge. In this paper, we use the so-called Space-Time Principal Component Analysis (STPCA) as a tool for propose a representation of the data set without missing values in a reduced dimension on which we can apply data mining and knowledge extraction algorithms. The effectiveness of the proposed method is demonstrated on real vehicle traffic data set containing about 15 million of measurements with rate of incompleteness of order 20% and more. Experiments show a really good behavior and strong robustness of the method to compute a representation of the data, summarize them and keep the inherent information.