Abstract
Forecasting extreme values in time series is an important but challenging problem as the extreme values are rarely observed even when a large amount of historical data is available. The modeling of extreme values requires a specific focus on estimating the tail distribution of the time series, whose statistical properties may differ from the distribution of its non-extreme values. To overcome this challenge, we present a novel self-supervised learning framework, SimEXT, to learn a robust representation of the time series that preserves the fidelity of its tail distribution. The framework employs a combination of contrastive learning and a reconstruction-based autoencoder architecture to facilitate robust representation learning of the temporal patterns associated with the extreme events. SimEXT also incorporates a wavelet-based data augmentation technique with a distribution-based loss function to prioritize the learning of extreme value distribution. We provide probabilistic guarantees on the wavelet-based augmentation that enables the wavelet coefficients to be perturbed during data augmentation without significantly altering the extreme values of the time series. Experimental results on real-world datasets show that SimEXT can effectively learn a robust representation of the time series to boost the performance of downstream tasks for forecasting block maxima values.